Setup a Terraform backend on S3 with AWS CloudFormation and CDK

6 minute read

Terraform is one of the most popular Infrastructure as Code (IaC) tools. Like other IaC tools, it lets you define a descriptive model of your cloud infrastructure and store it in one or more text files. This model describes the desired state of your infrastructure and can be used to reliably deploy, version and update your system.

It uses persisted state data to keep track of the resources it manages. Most production-grade configurations store this state data remotely allowing multiple people to access the state and work together. Also, remotely storing the state increase security because it avoids relying on the computer of the person working on the cloud infrastructure.

Since Terraform is cloud-agnostic, it supports storing the state data in many different ways. Terraform uses the term backend to refer to the system used to store the cloud resources state data and it supports many providers out-of-the-box. Among the many options, one can choose to leverage AWS to store the state data.

The backend using AWS requires an S3 bucket and, optionally, a DynamoDB table to enable state locking to avoid collisions between multiple actors.

We can use the AWS console to create these resources. Once done, we can instruct Terraform to use them by defining a backend element.

terraform {
    backend "s3" {
        bucket = "terraform-state"
        region = "eu-north-1"
        key = "path/to/terraform.tfstate"
        dynamodb_table = "terraform-state-lock"
    }
}

Now we can use the CLI tool to let Terraform initialize our backend.

$ terraform init

This will create in our bucket a file containing the state of the system represented in our Terraform application. Now, we can add resources and reliably deploy them in the cloud.

As usual, I’m skipping the authentication and authorization bits needed to deal with AWS.

So, everything works. We can create new applications and let Terraform take care of creating and configuring the resources and, most importantly, persist the state file in the shared bucket.

But the more I got used to Terraform or any other IaC tool, the more I got weary of that S3 bucket and DynamoDB table created via the console.

The problem

Can we use Infrastructure as Code to define the infrastructure needed by Terraform to operate?

Short answer is yes. We can definitely use Terraform to define and deploy a S3 bucket and a DynamoDB table, as shown in this post by The Lazy Engineer.

This would define an infrastructure of a higher order. The problem is that we don’t have a backend for this application and we would be relying on tracking its state on our computer.

Staggered deployment

Looking for a solution, I found this blog post by Shi Han.

In his post, the author suggests using a staggered deployment approach. They first deploy the S3 bucket, then configure the backend to use it and then reconfigure the application to use the bucket to store the state of the bucket itself.

The name they give to the paragraph, The chicken-and-egg problem, is definitely fitting.

Even if it works correctly, I’m not really satisfied by this solution.

Shi Han’s solution is based on a trick that contraddicts one of the corner stones of Infrastructure as Code: your code files should be a valid representation of your system at any given time.

CloudFormation and CDK

How do you break a chicken-and-egg problem? You change the context. If Terraform can’t be used to set up the infrastructure it needs, we can look at other tools. At first I was looking at other backend providers to be used for our higher-order architecture but none of the alternatives caught my eye.

I eventually decided to leverage CloudFormation and its CDK (Cloud Development Kit).

While I am not enthusiastic about using two different techonologies (CloudFormation and Terraform) for the same job (i.e. describe my cloud infrastructure), I am happy enough because:

  • CloudFormation is available to all AWS accounts, with no extra setup
  • The CDK makes it easy enough to work with CloudFormation by hiding all its quirks
  • I consider it acceptable to use different technologies for two different level of abstraction

Careful readers would be wondering if we really solved the chicken-and-egg problem. The answer is yes because CloudFormation takes care of persisting the state of the applications it manages (stacks in CloudFormation’s lingo) in resources already created.

So, let’s see how we can leverage the CDK to define and deploy the infrastructure needed by Terraform’s backend. Specifically, I’ll be writing a CDK application using the C# template.

Preparation

Let’s start by installing the required runtimes needed for working with CDK.

Let’s assume that the following tools have been installed and configured.

Once these are installed, let’s install the npm tool for CDK. We can then validate that the CDK CLI is correctly installed and configured.

$ npm install -g aws-cdk
$ cdk --version
2.51.1 (build 3d30cdb)

Finally, before we begin using CDK to deploy CloudFormation stacks, the CDK needs some required tools to be deployed on the receiving AWS account. This process is called bootstrapping.

# Bootstrap CDK for your account using `cdk bootstrap aws://ACCOUNT-NUMBER/REGION`
$ cdk bootstrap aws://123456789012/eu-north-1

You can read more about bootstrapping your account here

Now everything is ready for us to create our CDK app.

Creating the CDK app

Let’s create our CDK app.

We start creating a folder for the app, and then we use the CDK CLI to create an app based on the C# template.

$ mkdir TerraformBackend
$ cd TerraformBackend
$ cdk init app --language csharp

Once the template is generated, we have a .NET solution that can be customized to include the resources we need.

Customizing the stack

The solution contains a C# project with mainly two files:

  • Program.cs contains the code needed to initialize the CDK app.
  • TerraformBackendStack.cs contains the class that we will use to add our resources

Let’s start by adding the resources to the TerraformBackendStack. To do so, we simply augment the internal constructor generated by the template.

internal TerraformBackendStack(Construct scope, string id, IStackProps props = null)
    : base(scope, id, props)
{
    var bucket = new Bucket(this, "terraform-state", new BucketProps
    {
        Versioned = true,
        Encryption = BucketEncryption.S3_MANAGED,
        BlockPublicAccess = BlockPublicAccess.BLOCK_ALL
    });

    var table = new Table(this, "terraform-state-lock", new TableProps
    {
        TableName = "terraform-state-lock",
        BillingMode = BillingMode.PROVISIONED,
        ReadCapacity = 10,
        WriteCapacity = 10,
        PartitionKey = new Attribute { Name = "LockID", Type = AttributeType.STRING }
    });

    new CfnOutput(this, "TerraformBucket", new CfnOutputProps
    {
        ExportName = "terraform-state-bucket-name",
        Value = bucket.BucketName
    });

    new CfnOutput(this, "TerraformTable", new CfnOutputProps
    {
        ExportName = "terraform-state-lock-table-name",
        Value = table.TableName
    });
}

In the snippet above, I add a S3 bucket whose name will be generated automatically by CloudFormation and a DynamoDB table.

Finally, I added two outputs so that I can easily fetch the name of the bucket and of the table.

Next, I change the Program so that the stack will be protected from any accidental termination that could be initiated by other actors or with a misclick in the Console. Finally, I make sure that all resources are tagged following my company’s policy.

var app = new App();
var stack = new TerraformBackendStack(app, "TerraformBackend", new StackProps
{
    TerminationProtection = true
});

Tags.Of(stack).Add("Project", "TerraformBackend");
Tags.Of(stack).Add("Environment", "Shared");

app.Synth();

With these changes, we’re ready to deploy our stack.

Deploying the stack

The CDK makes it very easy to deploy the stack.

From the root of our CDK project, we simply need to run cdk deploy to intiate the creation or update of the stack on CloudFormation.

When everything is complete, the CDK CLI will print the outputs that we defined in the TerraformBackendStack

$ cdk deploy
...
Outputs:
TerraformBackend.TerraformBucket = some-very-random-string
TerraformBackend.TerraformTable = terraform-state-lock

Now we can use the two output values to correctly initialize our Terraform applications.

terraform {
    backend "s3" {
        bucket = "some-very-random-string"
        region = "eu-north-1"
        key = "path/to/terraform.tfstate"
        dynamodb_table = "terraform-state-lock"
    }
}

Recap

Infrastructure as Code is becoming more and more of a mindset and we should strive to always follow it. Sometimes the tooling we use has limitations that could stop us.

Terraform’s support for backend infrastructure is one of the many examples. In this post, we explore how we can use AWS CloudFormation and its CDK to circumvent the issue and use IaC to create the infrastructure needed to work with IaC at non-trivial levels.