Highly Available Docker Registry on AWS with Nexus

Have you ever wondered how you can build a highly available & resilient Docker Repository to store your Docker Images ?

Résultat de recherche d'images pour "you came to the right place meme"

In this post, we will setup an EC2 instance inside a Security Group and create an A record pointing to the server Elastic IP address as follow:

To provision the infrastructure, we will use Terraform as IaC (Infrastructure as Code) tool. The advantage of using this kind of tools is the ability to spin up a new environment quickly in different AWS region (or different IaaS provider) in case of incident (Disaster recovery).

Start by cloning the following Github repository:

Inside docker-registry folder, update the variables.tfvars with your own AWS credentials (make sure you have the right IAM policies).

I specified a shell script to be used as user_data when launching the instance. It will simply install the latest version of Docker CE and turn the instance to Docker Swarm Mode (to benefit from replication & high availability of Nexus container)

Note: Surely, you can use a Configuration Management Tools like Ansible or Chef to provision the server once created.

Then, issue the following command to create the infrastructure:

Once created, you should see the Elastic IP of your instance:

Connect to your instance via SSH:

Verify that the Docker Engine is running in Swarm Mode:

Check if Nexus service is running:

If you go back to your AWS Management Console. Then, navigate to Route53 Dashboard, you should see a new A record has been created which points to the instance IP address.

Point your favorite browser to the Nexus Dashboard URL (registry.slowcoder.com:8081). Login and create a Docker hosted registry as below:

Edit the /etc/docker/daemon.json file, it should have the following content:

Note: For production it’s highly recommended to secure your registry using a TLS certificate issued by a known CA.

Restart Docker for the changes to take effect:

Login to your registry with Nexus Credentials (admin/admin123):

In order to push a new image to the registry:

Verify that the image has been pushed to the remote repository:

To pull the Docker image:

Note: Sometimes you end up with many unused & dangling images that can quickly take significant amount of disk space:

You can either use the Nexus CLI tool or create a Nexus Task to cleanup old Docker Images:

Populate the form as below:

The task above will run everyday at midnight to purge unused docker images from “mlabouardy” registry.

Continuous Monitoring with TICK stack

Monitoring your system is required. It helps you detect any issues before they cause any major downtime that effect your customers and damage your business reputation. It helps you also to plan growth based on the real usage of your system. But collecting metrics from different data sources isn’t enough, you need to personalize your monitoring to meet your own business needs and define the right alerts so that any abnormal changes in the system will reported.

In this post, I will show you how to setup a resilient continuous monitoring platform with only open source projects & how to define an event alert to report changes in the system.

Clone the following Github repository:

1 – Terraform & AWS

In the tick-stack/terraform directory, update the variables.tfvars file with your own AWS credentials (make sure you have the right IAM policies) :

Issue the following command to download the AWS provider plugin:

Issue the following command to provision the infrastructure:

2 – Ansible & Docker

Update the inventory file with your instance DNS name:

Then, install the Ansible custom role:

Execute the Ansible Playbook:

Point your browser to http://DNS_NAME:8083, you should see InfluxDB Admin Dashboard:

Now, create an InfluxDB Data Source in Chronograf (http://DNS_NAME:8888):

Create a new Dashboard as follow:

You can create multiple graphs to visualize different types of metrics:

Note: For in depth details on how to create interactive & dynamic dashboards in Chronograf check my previous tutorial.

You need to elaborate on the data collected to do something like alerting. So make sure to enable Kapacitor:

Define a new alert to send a Slack notification if the CPU utilization is higher than 70%.

To test it out, we need to generate some workload. For this case, I used stress:

Stressing the CPU:

After few seconds, you should receive a Slack notification.

Setting up an etcd cluster on AWS using CoreOS & Terraform

This post is part of “IaC” series explaining how to use Infrastracture as Code concepts with Terraform. In this part, I will show you how to setup an etcd cluster on AWS using CoreOS & Terraform as shown in the diagram below :

All the templates used in this demo can be found on my Github 😁.

So let’s start with “variables.tf” file which contains the global variables such as AWS region, cluster instances type …

Note: As of writing this article, the latest stable CoreOS version is 1465.6.0.

So make sure to find an AMI that is as close to the latest version as possible.

Next, we need to define a security group for our cluster. For simplicity, Im going to make this security group open to the world. Even though security is important, this tutorial serves an educational purposes and you should never have all ports open in production.

And finally, we will define our cluster which consists of 3 Nodes:

In order to bring up an etcd cluster, I used a cloud config file that I passed as a parameter to user_data attribut:

Note: Make sure to grab the discovery token, and place it into the discovery parameter:

Once you defined all templates required, just type the following command to bring up the etcd cluster:

Note: Don’t forget  to set the AWS credentials as an envrionment variables before:

Setting up an etcd cluster in action is shown below 😎 :

Once done, go to your AWS Management Console then navigate to your EC2 Dashboard:

Congratulations ! 🎉🎉 You have your CoreOS cluster.

To verify the cluster health, you can either point your browser to the discovery url you generated earlier:

or SSH to one of your cluster nodes using the command:

Then, use the etcd command line to fetch the cluster status:

Now we have an etcd cluster ready to use. Let’s see what we can do with it:

  • Through etcdctl:

  • Through HTTP API:


Setup Docker Swarm on AWS using Ansible & Terraform

This post is part of “IaC” series explaining how to use Infrastracture as Code concepts with Terraform. In this part, I will show you how to setup a Swarm cluster on AWS using Ansible & Terraform as shown in the diagram below (1 Master and 2 Workers) in less than 1 min ⏱:


All the templates and playbooks used in this tutorial, can be found on my Github. 😎

Note: I did some tutorials about how to get started with Terraform on AWS, so make sure you read it before you go through this post.

1 – Setup EC2 Cluster using Terraform

1.1 – Global Variables

This file contains environment specific configuration like region name, instance type …

1.2 – Config AWS as Provider

1.3 – Security Group

This SG allows all the inbound/outbound traffic:

1.4 – EC2 Instances

Bootstrap script to install latest version of Docker:

2 – Transform to Swarm Cluster with Ansible

The playbook is self explanatory:

Now we defined all the required templates and playbook, we only need to type 2 commands to bring up the swarm cluster:

Note: Make sure to update the hosts file with the public ip of each EC2 instance.

Setting up the Swarm cluster in action is show below 😃 :

Manage AWS VPC as Infrastructure as Code with Terraform

In this tutorial, I will show you how to setup a VPC as described in the network diagram below in less than 1 min ⏱ using Terraform:

The VPC topology above is the best demonstration of what will be implemented:

  • The private subnet is inaccessible to the internet (both in and out)
  • The public subnet is accessible and all traffic ( is routed directly to the internet Gateway

Before we dive in, all the code used in this demo is available at my Github.

Note: I already did a tutorial on how to get started with Terraform so make sure to read it for more details.

1 – Global variables

This file contains environment specific configuration like region name, CIDR blocks, and AWS credentials …

2 – Configure the AWS provider

3 – Create a VPC

4 – Create Subnets

To make the public subnet addressable by the Internet, we need an Internet Gateway:

 5 – Internet Gateway

To allow traffics from the public subnet to the internet throught the NAT Gateway, we need to create a new Route Table.

6 – Route Table

Next, we will create a security group for each subnet.

7 – Security Groups

7 .1 – WebServer SG

This Security Group allows HTTP/HTTPS and SSH connections from anywhere.

7.2 – Database SG

This Security Group enable MySQL 3306 port, ping and SSH only from the public subnet.

Now we will deploy the EC2 instances, but before that we need to create a key pair in order to connect later to the instances via SSH.

8 – Key Pair

9 – EC2 Instances

9.1 – WebServer Instance

This instance will play the role of a webserver. Therefore, we pass to the instance userdata a shell script install.sh which contains commands to install an Apache Server:

9.2 – Database Instance

Once you’ve defined all the required templates, make sure to set the AWS credentials variables as an envrionment variables:

Note: You can always use your root user which has access permission to everything, but for security perspective, its recommended to use only a limited permissions user account. So create a new one using AWS IAM.

To see how terraform plans to create the resources type “terraform plan“. To create the infrastructure type “terraform apply“:

That will bring up the VPC, and all the necessary resources. Now in your AWS Management Console you should see the resources created:

If you click on the “Subnets” menu, you should see the public & private subnets:

The same goes for the Route Tables:

And the Internet Gateway:

Security Groups also:

WebServer Security group:

Database Security Group:

And finally the EC2 Instances:

WebServer Instance:

Database Instance:

Don’t forget to destroy the resources if they are not needed by typing “terraform destroy“: