Implementing AWS Systems Manager to Future-Proof Infrastructure Configuration Management

Cloud infrastructure has revolutionised the way organisations build and manage IT systems.

Implementing AWS Systems Manager to Future-Proof Infrastructure Configuration Management
Photo by Hal Gatewood / Unsplash
💡
This blog is featured on the Kurtosys Technology Medium Page
Implementing AWS Systems Manager to Future-Proof Infrastructure Configuration Management
Cloud infrastructure has revolutionised the way organisations build and manage IT systems. Cloud adoption brings with it a host of…

Cloud infrastructure has revolutionised the way organisations build and manage IT systems. Cloud adoption brings with it a of host challenges that need to be addressed in order to maintain a secure, compliant and cost effective environment. One of the biggest hurdles in cloud infrastructure is the sheer number of choices available, especially in the realm of infrastructure as code.

Making the correct choice can be overwhelming and cause anxiety. Will the solution be compliant? How will complexity be managed? Is it secure? Will costs spiral out of control? How will the inevitable updates and changes be managed? These are some of the questions that pose a significant challenge when attempting to settle on any solution.

In this blog, we will explore the challenges Kurtosys faces maintaining compliance in cloud infrastructure and how we address them using AWS EC2, configuration management tools like Chef and a suite of resources offered by Amazon Web Services.

Infrastructure automation is hard!

In today's climate of on-demand and always-up services, spread across multiple regions around the world - staying competitive in the blur of a fast-paced technology landscape demands innovative architecture design that cannot compromise efficiency, reliability, consistency and most of all, security. It most definitely demands automation. In fact, automation is a low level prerequisite for any cloud infrastructure design. Good luck deploying a multi-region, multi-faceted cloud infrastructure, servicing an active modern application, manually. It will never get off the ground.

Automation being a precondition to any infrastructure deployment certainly does not make the process of designing and implementing it any easier. The involvement of various components and technologies contributes to the complexity, thereby presenting significant challenges. Integrating automation into multiple systems and coupling them together can be difficult, especially when dealing with legacy systems or different vendor technologies.

The number of vendors to select from can sometimes be overwhelming

Automating infrastructure at scale requires a deep understanding of the underlying systems, as well as the ability to manage and scale automation processes as the infrastructure grows. Protecting the automated process from potential threats and vulnerabilities is a challenge, specifically when attempting to keep the automated infrastructure secure and follow security best practices.

An issue that is commonly overlooked is how the introduction of automation affects personnel, creating a resistance to change. This demands a leadership approach that is patient, persuasive and reassuring.

This maze of challenges make designing reliable and secure cloud infrastructure automation difficult but not impossible.

Our evolution of configuration management

Before Kurtosys chose AWS as their cloud vendor, the majority of our applications were hosted in an OpenStack based private cloud environment. At the time Puppet was the configuration management tool of choice. Puppet was a reliable tool. It uses a declarative language to define the desired state of a system, and then enforces that state by automatically making changes to the system as necessary. Desired state is configured as Puppet modules. Puppet modules are a collection of manifests, files, templates, and other resources that are used to configure and manage a specific aspect of a system, such as installing and configuring a web server or a database.

As the organisation continued its migration into AWS and it was necessary to begin deploying our flagship application into the new cloud environment, the decision was made to move over to Chef as a new configuration management tool. This decision was made primarily because majority of the Cloud engineers involved in the migration were more familiar with Chef as a configuration management tool. The team spent some time refactoring all the puppet modules to the equivalent Chef cookbooks.

Chef is a popular open-source configuration management and automation platform used to manage and automate infrastructure, applications, and services. Chef automates the deployment and management of infrastructure by using a series of recipes and cookbooks that define the desired state of the infrastructure. These recipes and cookbooks can be used to automate tasks such as software installation, configuration management, and security management.

AWS OpsWorks, a configuration management service that simplifies the process of deploying and managing applications on AWS infrastructure, was introduced to manage the chef cookbook deployments as opposed to a more traditional server-client methodology.

In AWS OpsWorks, a stack is a collection of AWS resources that work together to support an application or service, such as EC2 instances, load balancers, and databases. Each stack has its own configuration and can be used to manage different environments, such as production, staging, and development. OpsWorks provides a range of features to manage instances, such as automatic scaling, monitoring, and configuration management, which allows for easy management of infrastructure and applications. AWS OpsWorks provided a user friendly interface that allows administrators to launch and configure instances in the VPC and subnets of choice

AWS OpsWorks is a configuration management service that automates deployment, configuration, and management of applications on AWS infrastructure using Chef

Exploring the Full Capabilities of AWS Systems Manager for Cloud Infrastructure Management and Innovation

AWS Systems Manager is a service that enables users to manage and automate operational tasks across their AWS resources. At the time we were using AWS systems manager in a very limited way and this gave us an opportunity to really explore the power it could render.

Using systems manager for cost reporting, patch and compliance management and overall EC2 fleet control is on our infrastructure roadmap and this exploration is ongoing. We believe systems manager will be integral part of the future administration of our EC2 infrastructure.

AWS Systems Manager also enables compliance with industry standards and regulations by providing a unified view of operational data and automating the execution of compliance policies across AWS resource

We also had the space to really delve into the so-called “native” design solutions that Cloud vendors offer i.e. the ability to innovate in a made-to-measure fashion within a vast, resource rich ecosystem.

At Kurtosys, we designed an innovative automation strategy to deploy and configure our AWS EC2 instances using a variety of AWS services as well as the Chef configuration management tool, focusing on security, cost and compliance.

Streamlining AWS EC2 Instance Launches: An Innovative Guide

Creating resources in AWS should always be done in a controlled and measured way. Launching EC2 instances is no different. When we designed our new automation, at the top of our list was controlling cost, enforcing security and ensuring compliance. We wanted to make sure that any instance that was launched, was done so by an authorised principal, preferably an accredited Single Sign On (SSO) IAM role. This would ensure security measures were in place for the task of launching any instance. The next step was to ensure that the instance that was launched was compliant.

Compliance in this case revolves around instance tagging. We determine instances to be compliant if all the necessary tags are present. If a tag is missing or has incorrect values, it should not even be allowed to launch, let alone move towards the configuration stage. Non-compliant instances within our fleets or sprawled  across our accounts have security and cost implications that we wanted to avoid from the outset. We also wanted to monitor what instance were being deployed, where and by whom. This meant we needed a way to report on any instance that was deployed in any region and in any account.

After having a decent grip on the requirements and understanding that AWS Systems Manager would be our core automation model, we settled on a number of resources that would meet our compliance and

The EC2 instance Launch Sequence with Systems Manager

Launch Templates

EC2 instances can be launched in a multitude of ways. This can be done via Cloud Formation, the AWS Cloud Development Kit (CDK), the AWS Software Development Kit (SDK), the AWS command line, the AWS console and of course AWS Launch Templates.

AWS Launch Templates are a feature of AWS that provides a simplified and flexible way to launch EC2 instances. This is the method that we have chosen because it allows us to store and manage the configuration information for our instances in a single place, reducing the time and effort required to launch new instances.

AWS Launch Templates allow you to automate the launch of instances and maintain consistency across instances

The launch templates also allow us to manage multiple versions of an instance configuration, including the instance type, storage, security groups, and more. This is especially useful since we frequently launch instances and need a streamlined and efficient process.

In our case, each application within our software stack that requires EC2 instances to run, will have their own launch template. The instance that is eventually launched will be configured per this application.

Each launch templates will be configured with the required tags that enforce compliance and facilitate the Chef configuration process. Once the launch template meets the requirements, its latest version will be made as the default and it will be ready to launch instances.

Event bridge and AWS Lambda

AWS Event Bridge is a fully managed event bus service offered by AWS. It allows one to securely transfer data between AWS services, SaaS applications, and custom applications. It allows for automated tasks and processes execution based on events. This type of design is called event-driven architecture. In our case, the event would be the launching of EC2 instances.

Event Bridge provides a flexible and scalable solution for routing events to multiple targets, including AWS services like Lambda, SNS, and SQS, as well as third-party services like PagerDuty and Datadog.

For our compliance model, we selected Lambda functions as a flexible method to ensure each instance conforms to specifications required for it to pass as compliant.

Event Bridge would pass the EC2 launch event as a JSON payload to the Lambda function where the data would be evaluated, most specifically the tags. A number of minimal default tags should always exist and the content of their values should be very specific to the location in which the instance is being launched and the nature of the application running on the instance. If all the compliance checks pass, the instance will be allowed to proceed to the application configuration stage and a compliance notification will be fired off to Slack.

An example of a compliant EC2 launch notification

If the instance does not meet the compliance criteria, the instance will be shutdown and tagged with the non-compliance result. A launch failure notification will also be sent to slack with all the required details.

An example of a non-compliant EC2 launch notification

With this process in place, the team is confident that any attempt to launch an EC2 instance, in any manner that is non-compliant will never be in a running state ensuring EC2 sprawl and cost is kept to a minimum. Also knowing that each EC2 instance running in our environment has a very specific purpose and is being reported on, puts our security and compliance officers at ease.

Once a compliant instance is launched and running, the Lambda function will pass the instance id to Systems Manager Documents for the configuration stage.

Event bridge and AWS Lambda

AWS Systems Manager Documents is a service that enables you to create, manage, and share documents that define the actions that Systems Manager should perform on your infrastructure and applications. With Systems Manager Documents, you can define a set of steps, known as an automation document, written in YAML, that can be used to perform tasks such as patching, software installation, and maintenance tasks across your entire infrastructure.

In our case, we have two customised Automation Documents that execute the desired application configurations on our instances in collaboration with the Chef configuration manager.

AWS Systems Manager Documents define a set of actions to be performed on managed instances, enabling automation and standardisation of operational tasks across AWS resources.

Chef automation

Chef is a popular open-source configuration management and automation platform used to manage and automate infrastructure, applications, and services. As stated previously, Chef uses the concept of recipes and cookbooks that can be used to automate tasks such as software installation, configuration management, and security management. Chef is particularly useful for automating a host of IT operations, reduce manual effort, and improve consistency and reliability in the cloud and on-premise.

Each application in our stack that resides on an EC2 instance has it’s own Chef cookbook. Some cookbooks are crossed referenced to install multiple application or requirements, like access to AWS resources, utilising the power and flexibility of Chef.

The Chef configuration chef-client run executes a set of recipes and resources defined in cookbooks, enabling configuration management and automation of infrastructure resources.

Chef has a concept of JSON attributes. JSON attributes are a type of configuration data used by the configuration management tool. These attributes are defined in JSON format and can be used to specify various settings for Chef cookbooks, recipes, and other components. JSON attributes were commonly used in our previous automation iteration using OpsWorks, however since this was no longer used and we were now using Systems Manager as the automation mechanism, the use of JSON attributes needed to be refactored.

JSON attributes, in simple terms, are merely a store for data used by the recipe. Some sort of store is necessary in the development of the cookbooks to define dynamic variables. The refactoring of the cookbooks utilised AWS Secrets Manager to store the required configuration variables. AWS Secrets Manager is a fully managed AWS service that allows one to easily and securely store and retrieve secrets such as database credentials, API keys, and other sensitive information. This solution worked well as we were able to easily reference our required configurations variable in the Chef cookbook via Secrets Manager with confidence that access was secure.

An example of Secrets Manager reference in a Chef cookbook

We develop our Chef cookbooks via a software development cycle. Cookbooks for each application including the parent cookbook reside in Github. Specific repository actions trigger an AWS Code Pipeline sequence that places a compressed artifact on an S3 bucket ready to be consumed by AWS Systems Manager Documents.

Like most, if not all, development cycles and processes, we are continually improving our development methods and testing to ensure all cookbooks are developed and tested to the highest standard we can achieve.

The Automation Documents retrieve the artifact from S3 and after the chef installation process uncompresses the repository onto the instance, starts the Chef client run which initiates the parent cookbook, which in turn initiates the specific application cookbook for the instances and viola! the instance is configured to the correct specifications.

Conclusion

The adoption of cloud infrastructure has transformed the way businesses manage their IT systems. However, it comes with a myriad of challenges, especially in infrastructure as code, which can be overwhelming and cause anxiety. It is crucial to address these challenges to maintain a secure, compliant, and cost-effective environment.

Kurtosys continuously strives to implement innovative solutions in all aspects of their cloud infrastructure. In this case, it was solving automated EC2 launch and configuration challenges but this innovative spirit and culture permeates the entire organisation. Clients and stakeholders can be confident that as Kurtosys continues to achieve success in their IT operations, the value will continue to reflect in all aspects of the service offerings.