Network Infrastructure Weathermap

The main goal of collecting metrics is to store them for long term usage and to create graphs to debug problems or identify trends. However, storing metrics about your system isn’t enough to identity the problem’s & anomalies root cause. It’s necessary to have a high-level overview of your network backbone. Weathermap is perfect for a Network Operations Center (NOC). In this post, I will show you how to build one using Open Source tools only.

Icinga 2 will collect metrics about your backbone, write checks results metrics and performance data to InfluxDB (supported since Icinga 2.5). Visualize these metrics in Grafana in map form.

To get started, add your desired host configuration inside the hosts.conf file:

Note: the city & country attributes will be used to create the weathermap.

To enable the InfluxDBWriter on your Icinga 2 installation, type the following command:

Configure your InfluxDB host and database in /etc/icinga2/features-enabled/influxdb.conf (Learn more about the InfluxDB configuration)

Icinga 2 will forward all your metrics to a icinga2_metrics database. The included host and service templates define a storage, the measurement represents a table by which metrics are grouped with tags certain measurements of certain hosts or services are defined (notice the city & country tags usage).

Don’t forget to restart Icinga 2 after saving your changes:

Once Icinga 2 is up and running it’ll start collecting data and writing them to InfluxDB:

Once our data arrived, it’s time for visualization. Grafana is widely used to generate graphs and dashboards. To create a Weathermap we can use a Grafana plugin called Worldmap Panel. Make sure to install it using grafana-cli tool:

The plugin will be installed into your grafana plugins directory (/var/lib/grafana/plugins):

Restart Grafana, navigate to Grafana web interface and create a new datasource:

Create a new Dashboard:

The Group By clause should be the country code and an alias is needed too. The alias should be in the form $tag_field_name. See the image below for an example of a query:

Under the Worldmap tab, choose the countries option:

Finally, you should see a tile map of the world with circles representing the state of each host.

The field state possible values (0 – OK, 1 – Warning, 2 – Critical, 3 – Unknown/Unreachable)

Note: For lazy people I created a ready to use Dashboard you can import from GitHub.

Real-Time Infrastructure Monitoring with Amazon Echo

Years ago, managing your infrastructure through voice was a science-fiction movie, but thanks to virtual assistants like Alexa it becomes a reality. In this post, I will show you how I was able to monitor my infrastructure on AWS using a simple Alexa Skill.

At a high level, the architecture of the skill is as follows:

I installed data collector agent (Telegraf) in each EC2 instance to collect metrics about system usage (disk, memory, cpu ..) and send them to a time-series database (InfluxDB)

Once my database is populated with metrics, Amazon echo will transform my voice commands to intents that will trigger a Lambda function, which will use the InfluxDB REST API to query the database.

Enough with talking, lets build this skill from scratch, clone the following GitHub repository:

Create a simple fleet of EC2 instances using Terraform. Install the AWS provider:

Set your own AWS credentials in variables.tfvars. Create an execution plan:

Provision the infrastructure:

You should see the IP address for each machine:

Login to AWS Management Console, you should see your nodes has been created successfully:

To install Telegraf on each machine, I used Ansible, update the ansible/inventory with your nodes IP addresses as follows:

Execute the playbook:

If you connect via SSH to one of the server, you should see the Telegraf agent is running as Docker container:

In few seconds the InfluxDB database will be populated with some metrics:

Sign in to the Amazon Developer Portal, create a new Alexa Skill:

Create an invocation name – aws – This is the word that will trigger the Skill.

In the Intent Schema box, paste the following JSON code:

Create a new slot types to store the type of metric and machine hostname:

Under Uterrances, enter all the phrases that you think you might say to interact with the skill

Click on “Next” and you will move onto a page that allows us to use an ARN (Amazon Resource Name) to link to AWS Lambda.

Before that, let’s create our lambda function, go to AWS Management Console and create a new lambda function from scratch:

Note: Select US East (N.Virginia), which is a supported region for Alexa Skill Kit.

Make sure the trigger is set to Alexa Skills Kit, then select Next.

The code provided uses the InfluxDB client to fetch metrics from database.

Specify the .zip file name as your deployment package at the time you create the Lambda function. Don’t forget to set the InfluxDB Hostname & Database name as an environment variables:

Then go to the Configuration step of your Alexa Skill in the Amazon Developer Console and enter the Lambda Function ARN:

Click on “Next“. Under the “Service Simulator” section, you’ll be able to enter a sample utterance to trigger your skill:

Memory usage:

Disk usage:

CPU usage:

Test your skill on your Amazon Echo, Echo Dot, or any Alexa device by saying, “Alexa, ask AWS for disk usage of machine in Paris

Butler CLI: Export/Import Jenkins Plugins & Jobs

Not long ago, I had to migrate Jenkins jobs from an old server to a new one. That’s where StackOverflow comes into the play, below the most voted answers I found:

In spite of their advantages, those solutions comes with their downsides especially if you have a large number of jobs to move or no access root to the server. But, guess what ? I didn’t stop there. I have came up with a CLI to make your life easier and export/import not only Jenkins jobs but also plugins like a boss.

To get started, find the appropriate package for your system and download it. For linux:

Note: For Windows make sure that butler binary is available on the PATHThis page contains instructions for setting the PATH on Windows.

Once done, verify the installation worked, by opening a new terminal session and checking if butler is available :

1 – Plugins Management

To export Jenkins jobs, you need to provide the URL of the source Jenkins instance:

As shown above, butler will dump a list of plugins installed to stdout and a new file plugins.txt will be generated, with list of installed Jenkins plugins with name and version pairs:

Now, to import the plugins to the new Jenkins instance, use the command below with the URL of the Jenkins target instance as an argument:

Butler will install each plugin on the target Jenkins instance by issuing API calls.

2 – Jobs Management

To export Jenkins jobs, just provide the URL of the source Jenkins server:

A new directory jobs/ will be created with every job in Jenkins. Each job will have its own configuration file config.xml.

Now, to import the jobs to the new Jenkins instance, issue the following command:

Butler will use the configuration files created earlier to issue API calls to target Jenkins instance to create jobs.

Once you are done, check Jenkins and you should see your jobs successfully created :

Hope it helps ! The CLI is still in its early stages, so you are welcome to contribute to the project in Github.

Highly Available Docker Registry on AWS with Nexus

Have you ever wondered how you can build a highly available & resilient Docker Repository to store your Docker Images ?

Résultat de recherche d'images pour "you came to the right place meme"

In this post, we will setup an EC2 instance inside a Security Group and create an A record pointing to the server Elastic IP address as follow:

To provision the infrastructure, we will use Terraform as IaC (Infrastructure as Code) tool. The advantage of using this kind of tools is the ability to spin up a new environment quickly in different AWS region (or different IaaS provider) in case of incident (Disaster recovery).

Start by cloning the following Github repository:

Inside docker-registry folder, update the variables.tfvars with your own AWS credentials (make sure you have the right IAM policies).

I specified a shell script to be used as user_data when launching the instance. It will simply install the latest version of Docker CE and turn the instance to Docker Swarm Mode (to benefit from replication & high availability of Nexus container)

Note: Surely, you can use a Configuration Management Tools like Ansible or Chef to provision the server once created.

Then, issue the following command to create the infrastructure:

Once created, you should see the Elastic IP of your instance:

Connect to your instance via SSH:

Verify that the Docker Engine is running in Swarm Mode:

Check if Nexus service is running:

If you go back to your AWS Management Console. Then, navigate to Route53 Dashboard, you should see a new A record has been created which points to the instance IP address.

Point your favorite browser to the Nexus Dashboard URL (registry.slowcoder.com:8081). Login and create a Docker hosted registry as below:

Edit the /etc/docker/daemon.json file, it should have the following content:

Note: For production it’s highly recommended to secure your registry using a TLS certificate issued by a known CA.

Restart Docker for the changes to take effect:

Login to your registry with Nexus Credentials (admin/admin123):

In order to push a new image to the registry:

Verify that the image has been pushed to the remote repository:

To pull the Docker image:

Note: Sometimes you end up with many unused & dangling images that can quickly take significant amount of disk space:

You can either use the Nexus CLI tool or create a Nexus Task to cleanup old Docker Images:

Populate the form as below:

The task above will run everyday at midnight to purge unused docker images from “mlabouardy” registry.

Continuous Monitoring with TICK stack

Monitoring your system is required. It helps you detect any issues before they cause any major downtime that effect your customers and damage your business reputation. It helps you also to plan growth based on the real usage of your system. But collecting metrics from different data sources isn’t enough, you need to personalize your monitoring to meet your own business needs and define the right alerts so that any abnormal changes in the system will reported.

In this post, I will show you how to setup a resilient continuous monitoring platform with only open source projects & how to define an event alert to report changes in the system.

Clone the following Github repository:

1 – Terraform & AWS

In the tick-stack/terraform directory, update the variables.tfvars file with your own AWS credentials (make sure you have the right IAM policies) :

Issue the following command to download the AWS provider plugin:

Issue the following command to provision the infrastructure:

2 – Ansible & Docker

Update the inventory file with your instance DNS name:

Then, install the Ansible custom role:

Execute the Ansible Playbook:

Point your browser to http://DNS_NAME:8083, you should see InfluxDB Admin Dashboard:

Now, create an InfluxDB Data Source in Chronograf (http://DNS_NAME:8888):

Create a new Dashboard as follow:

You can create multiple graphs to visualize different types of metrics:

Note: For in depth details on how to create interactive & dynamic dashboards in Chronograf check my previous tutorial.

You need to elaborate on the data collected to do something like alerting. So make sure to enable Kapacitor:

Define a new alert to send a Slack notification if the CPU utilization is higher than 70%.

To test it out, we need to generate some workload. For this case, I used stress:

Stressing the CPU:

After few seconds, you should receive a Slack notification.