Komiser: AWS Environment Inspector

In order to build HA & Resilient applications in AWS, you need to assume that everything will fail. Therefore, you always design and deploy your application in multiple AZ & regions. So you end up with many unused AWS resources (Snapshots, ELB, EC2, Elastic IP, etc) that could cost you a fortune.

One pillar of AWS Well-Architected Framework is Cost optimization. That’s why you need to have a global overview of your AWS Infrastructure. Fortunately, AWS offers many fully-managed services like CloudWatch, CloudTrail, Trusted Advisor & AWS Config to help you achieve that. But, they require a deep understanding of AWS Platform and they are not straighforward.

That’s why I came up with Komiser a tool that simplifies the process by querying the AWS API to fetch information about almost all critical services of AWS like EC2, RDS, ELB, S3, Lambda … in real-time in a single Dashboard.

Note: To prevent excedding AWS API rate limit for requests, the response is cached in in-memory cache by default for 30 minutes.

Komiser supported AWS Services:

  • Compute:
    • Running/Stopped/Terminated EC2 instances
    • Current EC2 instances per region
    • EC2 instances per family type
    • Lambda Functions per runtime environment
    • Disassociated Elastic IP addresses
    • Total number of Key Pairs
    • Total number of Auto Scaling Groups
  • Network & Content Delivery:
    • Total number of VPCs
    • Total number of Network Access Control Lists
    • Total number of Security Groups
    • Total number of Route Tables
    • Total number of Internet Gateways
    • Total number of Nat Gateways
    • Elastic Load Balancers per family type (ELB, ALB, NLB)
  • Management Tools:
    • CloudWatch Alarms State
    • Billing Report (Up to 6 months)
  • Database:
    • DynamoDB Tables
    • DynamoDB Provisionned Throughput
    • RDS DB instances
  • Messaging:
    • SQS Queues
    • SNS Topics
  • Storage:
    • S3 Buckets
    • EBS Volumes
    • EBS Snapshots
  • Security Identity & Compliance:
    • IAM Roles
    • IAM Policies
    • IAM Groups
    • IAM Users

1 – Configuring Credentials

Komiser needs your AWS credentials to authenticate with AWS services. The CLI supports multiple methods of supporting these credentials. By default the CLI will source credentials automatically from its default credential chain. The common items in the credentials chain are the following:

  • Environment Credentials
  • Shared Credentials file (~/.aws/credentials)
  • EC2 Instance Role Credentials

To get started, create a new IAM user, and assign to it this following IAM policy:

Next, generate a new AWS Access Key & Secret Key, then update ~/.aws/credentials file as below:

2 – Installation

2.1 – CLI

Find the appropriate package for your system and download it. For linux:

Note: The Komiser CLI is updated frequently with support for new AWS services. To see if you have the latest version, see the project Github repository.

After you install the Komiser CLI, you may need to add the path to the executable file to your PATH variable.

2.2 – Docker Image

Use the official Komiser Docker Image:

3 – Overview

Once installed, start the Komiser server:

If you point your favorite browser to http://localhost:3000, you should see Komiser Dashboard:

Hope it helps ! The CLI is still in its early stages, so you are welcome to contribute to the project on Github.

Network Infrastructure Weathermap

The main goal of collecting metrics is to store them for long term usage and to create graphs to debug problems or identify trends. However, storing metrics about your system isn’t enough to identity the problem’s & anomalies root cause. It’s necessary to have a high-level overview of your network backbone. Weathermap is perfect for a Network Operations Center (NOC). In this post, I will show you how to build one using Open Source tools only.

Icinga 2 will collect metrics about your backbone, write checks results metrics and performance data to InfluxDB (supported since Icinga 2.5). Visualize these metrics in Grafana in map form.

To get started, add your desired host configuration inside the hosts.conf file:

Note: the city & country attributes will be used to create the weathermap.

To enable the InfluxDBWriter on your Icinga 2 installation, type the following command:

Configure your InfluxDB host and database in /etc/icinga2/features-enabled/influxdb.conf (Learn more about the InfluxDB configuration)

Icinga 2 will forward all your metrics to a icinga2_metrics database. The included host and service templates define a storage, the measurement represents a table by which metrics are grouped with tags certain measurements of certain hosts or services are defined (notice the city & country tags usage).

Don’t forget to restart Icinga 2 after saving your changes:

Once Icinga 2 is up and running it’ll start collecting data and writing them to InfluxDB:

Once our data arrived, it’s time for visualization. Grafana is widely used to generate graphs and dashboards. To create a Weathermap we can use a Grafana plugin called Worldmap Panel. Make sure to install it using grafana-cli tool:

The plugin will be installed into your grafana plugins directory (/var/lib/grafana/plugins):

Restart Grafana, navigate to Grafana web interface and create a new datasource:

Create a new Dashboard:

The Group By clause should be the country code and an alias is needed too. The alias should be in the form $tag_field_name. See the image below for an example of a query:

Under the Worldmap tab, choose the countries option:

Finally, you should see a tile map of the world with circles representing the state of each host.

The field state possible values (0 – OK, 1 – Warning, 2 – Critical, 3 – Unknown/Unreachable)

Note: For lazy people I created a ready to use Dashboard you can import from GitHub.

Real-Time Infrastructure Monitoring with Amazon Echo

Years ago, managing your infrastructure through voice was a science-fiction movie, but thanks to virtual assistants like Alexa it becomes a reality. In this post, I will show you how I was able to monitor my infrastructure on AWS using a simple Alexa Skill.

At a high level, the architecture of the skill is as follows:

I installed data collector agent (Telegraf) in each EC2 instance to collect metrics about system usage (disk, memory, cpu ..) and send them to a time-series database (InfluxDB)

Once my database is populated with metrics, Amazon echo will transform my voice commands to intents that will trigger a Lambda function, which will use the InfluxDB REST API to query the database.

Enough with talking, lets build this skill from scratch, clone the following GitHub repository:

Create a simple fleet of EC2 instances using Terraform. Install the AWS provider:

Set your own AWS credentials in variables.tfvars. Create an execution plan:

Provision the infrastructure:

You should see the IP address for each machine:

Login to AWS Management Console, you should see your nodes has been created successfully:

To install Telegraf on each machine, I used Ansible, update the ansible/inventory with your nodes IP addresses as follows:

Execute the playbook:

If you connect via SSH to one of the server, you should see the Telegraf agent is running as Docker container:

In few seconds the InfluxDB database will be populated with some metrics:

Sign in to the Amazon Developer Portal, create a new Alexa Skill:

Create an invocation name – aws – This is the word that will trigger the Skill.

In the Intent Schema box, paste the following JSON code:

Create a new slot types to store the type of metric and machine hostname:

Under Uterrances, enter all the phrases that you think you might say to interact with the skill

Click on “Next” and you will move onto a page that allows us to use an ARN (Amazon Resource Name) to link to AWS Lambda.

Before that, let’s create our lambda function, go to AWS Management Console and create a new lambda function from scratch:

Note: Select US East (N.Virginia), which is a supported region for Alexa Skill Kit.

Make sure the trigger is set to Alexa Skills Kit, then select Next.

The code provided uses the InfluxDB client to fetch metrics from database.

Specify the .zip file name as your deployment package at the time you create the Lambda function. Don’t forget to set the InfluxDB Hostname & Database name as an environment variables:

Then go to the Configuration step of your Alexa Skill in the Amazon Developer Console and enter the Lambda Function ARN:

Click on “Next“. Under the “Service Simulator” section, you’ll be able to enter a sample utterance to trigger your skill:

Memory usage:

Disk usage:

CPU usage:

Test your skill on your Amazon Echo, Echo Dot, or any Alexa device by saying, “Alexa, ask AWS for disk usage of machine in Paris

Highly Available WordPress Blog

In this post you will learn about the easiest way to deploy a fault tolerant and scalable WordPress on AWS.

To get started, setup a Swarm cluster on AWS by following this tutorial Setup Docker Swarm on AWS using Ansible & Terraform:

Now your cluster is ready to use. You are ready to go !

WordPress stores some files on disk (plugins, themes, images …) which causes a problem if you want to use a fleet of EC2 instances to run your blog in case of high traffic:

That’s where AWS EFS (Elastic File System) comes into the play. The idea is to mount shared volumes using the NFS protocol in each host to synchronize files between all nodes in the cluster.

So create an Elastic File System, make sure to deploy it in the same VPC on which your Swarm cluster is created:

Once created, note the DNS name:

Now, mount Amazon EFS file systems via the NFSv4.1 protocol on each node:

We can verify the mount with a plain df -h command:

WordPress requires a relational database. Create an Amazon Aurora database:

Wait couple of minutes, then the database should be ready, copy the endpoint of database:

To deploy the stack, I’m using the following Docker Compose file:

In addition to wordpress container, Im using Traefik as reverse proxy to be able to scale out my blog easily with docker service scale command.

In your Manager node run the following command to deploy the stack:

At this point, you should have a clean install of WordPress running.

Fire up your browser and point it to manager public IP address, you will be greeted with the familiar WordPress setup page:

If you’re expecting a high traffic, you can easily scale the WP service using the command:

Verify Traefik Dashboard:

That’s how to build a scalable WordPress blog with no single points of failure.

Highly Available Docker Registry on AWS with Nexus

Have you ever wondered how you can build a highly available & resilient Docker Repository to store your Docker Images ?

Résultat de recherche d'images pour "you came to the right place meme"

In this post, we will setup an EC2 instance inside a Security Group and create an A record pointing to the server Elastic IP address as follow:

To provision the infrastructure, we will use Terraform as IaC (Infrastructure as Code) tool. The advantage of using this kind of tools is the ability to spin up a new environment quickly in different AWS region (or different IaaS provider) in case of incident (Disaster recovery).

Start by cloning the following Github repository:

Inside docker-registry folder, update the variables.tfvars with your own AWS credentials (make sure you have the right IAM policies).

I specified a shell script to be used as user_data when launching the instance. It will simply install the latest version of Docker CE and turn the instance to Docker Swarm Mode (to benefit from replication & high availability of Nexus container)

Note: Surely, you can use a Configuration Management Tools like Ansible or Chef to provision the server once created.

Then, issue the following command to create the infrastructure:

Once created, you should see the Elastic IP of your instance:

Connect to your instance via SSH:

Verify that the Docker Engine is running in Swarm Mode:

Check if Nexus service is running:

If you go back to your AWS Management Console. Then, navigate to Route53 Dashboard, you should see a new A record has been created which points to the instance IP address.

Point your favorite browser to the Nexus Dashboard URL (registry.slowcoder.com:8081). Login and create a Docker hosted registry as below:

Edit the /etc/docker/daemon.json file, it should have the following content:

Note: For production it’s highly recommended to secure your registry using a TLS certificate issued by a known CA.

Restart Docker for the changes to take effect:

Login to your registry with Nexus Credentials (admin/admin123):

In order to push a new image to the registry:

Verify that the image has been pushed to the remote repository:

To pull the Docker image:

Note: Sometimes you end up with many unused & dangling images that can quickly take significant amount of disk space:

You can either use the Nexus CLI tool or create a Nexus Task to cleanup old Docker Images:

Populate the form as below:

The task above will run everyday at midnight to purge unused docker images from “mlabouardy” registry.