AWS CDK Pipelines: Real-World Tips and Tricks — Part 1

Jels Boulangier
The Startup
Published in
8 min readJan 10, 2021

--

Photo by Florian Wächter on Unsplash

In this article I’ll share with you some useful tips and tricks when using AWS CDK Pipelines that go beyond the simple demos and which can be implemented in your real-world applications. Definitely take a look at the second article in this series for even more tips and trick.

The AWS Cloud Development Kit (AWS CDK) is an open source software development framework to define your cloud application resources using familiar programming languages. This latter is the biggest difference from other infrastructure-as-code tools, which use configuration files and special-purpose languages. AWS CDK is a framework which, under the hood, creates AWS CloudFormation templates to deploy AWS resources. The power of AWS CDK is that it allows you to construct an AWS CloudFormation template of more than 500 lines that deploys 50 AWS resources with a mere 20 lines of code! In addition to the high level of abstraction and default configuration, you can include programmatic logic when defining your infrastructure, use object-oriented techniques, and create and share libraries. Moreover, you can take advantage of code completion tools in your IDE since AWS CDK supports familiar programming language such as TypeScript, JavaScript, Python, Java, and C#/.Net.

… deploys 50 AWS resources with a mere 20 lines of code!

In July 2020, the AWS CDK development team made automatic deployment of infrastructure even better by introducing AWS CDK Pipelines. Before this, deploying CDK applications to multiple AWS accounts and regions was definitely doable, however, it needed some custom scripts to tie everything together and increase automation. Here is where CDK Pipelines comes into the picture. It increases the practicality of automated, multi-account, multi-region deployment by integrating a CDK application into a continuous delivery pipeline using AWS CodePipelines. The pipeline itself is also created with the CDK framework. In addition to the whole integrated system, which is purely defined in code, the pipeline created by CDK Pipelines is self-mutating! This means that you only need to deploy the pipeline once to get it started and, after that, the pipeline automatically updates itself when you add new parts to the CDK application’s source code. With CDK Pipelines, teams can easily create and share “pipelines-as-code” patterns for deploying applications and infrastructure across AWS accounts and regions.

Even though CDK Pipelines is still in developer preview and subjected to major changes, you can already use it in real-world applications. But be aware that because this libraries is still in its infancy, the community has yet to figure out what the best practices are for creating complex applications. Current documentation and demos of CDK Pipelines, although useful, are limited to simple minimum viable products. Because of this, I want to share my tips and tricks which I’ve put together by working on a project at my current job and through interaction with the CDK community.

The next sections in this article all answer a specific question which I had — and you most likely will too. All of them can be considered as stand-alone and follow the simple format: “How do I do this?” These sections are not chronological, nor do they explain the basics of CDK Pipelines.

How do I structure my CDK Pipelines application?

There is no one way you need to structure your CDK Pipelines application. However, through trial and error, and community interaction, a good one is to have both application code and infrastructure code in the same project/repository where each of these is located in a separate directory. Meaning that at root level, your project looks like this:

.
├── code
└── infrastructure

The code directory contains all files, information, configuration,… for the application code. Likewise, all CDK/infrastructure related files are in the infrastructure directory. Keep these two distinct directories even if the project is only infrastructure and no application code. It ensures standardisation across projects and application code might be introduced in future development. Additionally, the root directory contains hidden files/directories which are editor specific, a .gitignore file, a README.md file,…

How do I deploy cross-environment?

All resources which need to be deployed to the same environment — an AWS account and region combination — should be in the same Stage. Next, the resources in this stage can be deployed to multiple environments via configuration when creating the stage object. In order to keep an oversight, determining to which environment to deploy should happen at the PipelineStack level and not within a Stage itself. How exactly you create the configuration of multi-environments is up to you since this is often a personal preference (and I’m not yet satisfied with my approach 🤔. Edit: In the meantime, I’ve written a second article about CDK Pipelines in which I address a useful configuration setup.).

This is an example of deploying resources to four different AWS environments. Yet, only two distinct stages are created, Mystage and MyStageUsEast, that each will deploy different resources (the second one deploys to the us-east-1 region). Deployment of these two stages to the spike and development environments only differ by passing a different configuration object.

A snippet of deploying the same resources to different environment just by differnt configuration.

How do I share information between stacks?

Information between stacks can be shared by passing those variables between the stacks in your CDK application. However, this only works for stacks which will be deployed to the same environment, and hence stacks within the same stage. Cross-environment stack sharing is not supported. See How do I share information between stages? for higher level information sharing.

If possible, you can also write string-like variables such as an ARN to a CfnOutput, pass that CfnOutput object to the target stage, and fetch the resource with a method using the ARN in the CfnOutput object. But to be honest, I’m not sure what the benefit of this is compared to passing the resource object itself.

How do I share information between stages?

Passing variables between stages is not supported. When you try do to this, you’ll get an error which contains the message “dependency cannot cross stage boundaries”. The easiest solution is to write the desired values from stage one to a Systems Manager Parameter Store (SSM) and fetch that value in the target stage.

Writing to the parameter store can for example be done via:

Snippet of how to write a variable to the SSM parameter store.

Which will write to the parameter store in the environment where this stack will be deployed to.

If the target stage is deployed to the same environment — remember that this is an account and region combination — this parameter can be fetched with an ssm.StringParameter construct.

Snippet of how to read a variable from the SSM parameter store in the same AWS environment.

However, most of the time, different stages will deploy resources to different environments. In this case, the stage which needs this information needs to fetch this value using an AwsCustomResource.

Define the SSM reader class in a ssmParamReader.ts file

Snippet of a class to read a variable from the SSM parameter store in the generic AWS environment.

And create such an object in the target stage as which allows you to fetch the parameter value.

Snippet of how to use the previous class to read a variable from the SSM parameter store in another AWS environment.

Note that it recently is also possible to construct your own Fn::ImportValue expression to solve this issue.

How do I share information between regions?

Since resources in different regions will be in different stages, sharing information needs to happen the same way as “How do I share information between stages?

How do I share information between environments?

Since resources in different environments will most often be in different stages, sharing information needs to happen the same way as “How do I share information between stages?

How do I build my application code?

Building your application code can be done in the synthAction of your pipeline. The build command can simply be npm run build where the specifics of the build are defined in the build script in the package.json file. This specific example first compiles the TypeScript infrastructure code with tsc and subsequently builds the application code using Maven. Remember that the application code and the infrastructure code (CDK Pipelines) are in two different directories ./code and ./infrastructure, respectively.

A snippet of how to build your application code in the pipeline.

Here is part of the package.json file where the mvn command also builds the application code.

A snippet of a part of the package.json file which specifies the details of the npm build script.

How do I test my application code?

Just like “How do I build my application code?” this step can be part of the npm run build command which is defined in the package.json file. Here is again part of that file where the mvn command also runs the tests of the application code. The specifics of the tests are abstracted away from the pipeline by the pom.xml file, located in de ./code directory. This shows that development of application code can happen separate from infrastructure code.

How do I pass my application code to my infrastructure?

After building your application code, it will most likely have to be deployed to some infrastructure resource. Such an action can be done with an aws-codepipeline-action. Since Actions use Artifacts, your build application code should be converted to an Artifact.

Here is an example of deploying the build output to an S3 bucket which is also made in the CDK application. For this particular project, in some later stage, an AWS Elastic Beanstalk construct needs to fetch the deployed application code from the S3 bucket. This way, the pipeline made sure that fetching the build code will not fail.

Snippet of how to share build application code with infrastructure code.

Note that when using custom Artifacts, all Artifacts need an artifactName.

How do I run docker containers inside my CDK pipeline?

CodeBuild instances are actual docker containers themselves which run on EC2 instances shared among other AWS users. Because of this, it is not possible to access the host’s docker daemon which would allow to manipulate other user’s containers. As an alternative, Docker-in-Docker is present on the standard build images. To be able to run nested docker daemon, elevated privileges need to be enabled in the build environment. Additionally, most likely some docker-related environment variables need to be specified.

E.g., using testcontainers (for running unit tests of the application code) as Docker-in-Docker instances, you need to correctly specify the nested docker host and explicitly say to not use TLS as this is not set up for the nested docker daemon (logically because this does not provide any added security value in this particular scenario).

Snippet of how to run Docker images inside the pipeline.

Because your pipeline’s CodeBuild instances are Docker containers which run on shared EC2 instances, it is very likely to run into the new docker image pull limits when doing this anonymously. According to Docker’s documentation “Unauthenticated (anonymous) users will have the limits enforced via IP”. And the IP is that of the EC2 instance, shared among AWS users. So, keep this in mind!

I think that for now, this is plenty of information. I do have more tips to share and these will most likely appear in a future article. (Edit: A second article has appeared! Go check it out for more CDK Pipelines tips and tricks.) Meanwhile, the AWS CDK development team continues to extend the CDK libraries at a high pace, so things might change quickly.

I hope these tips and tricks have helped you in some way!

A few useful links

--

--

Jels Boulangier
The Startup

Self-taught software, data, and cloud engineer with a PhD in Astrophysics who continuously improves his skills through blogs, videos, and online tutorials.