Terraform detects unexpected changes when changing the AWS provider region

Question

Running into an issue where I have a null resource that runs a script to create a piece of configuration. The entire TF needs to be run in each region of an account so I had set everything up to use region as a variable.

This is causing the terraform to destroy the original null resource configuration in region 1 before it creates the new configuration in region 2. The other terraform resources just create new resources in region 2 and leave the ones in region 1 in place. Is there any way to get the null resource to behave the same way as the other resources?

resource "null_resource" "config-s3-remediation" {
  triggers = {
    account_name = var.account_name
    region = var.region
  }
  depends_on = [
    aws_config_config_rule.s3_access_logging_rule,
    aws_ssm_document.s3_access_logging_ssm
  ]

  provisioner "local-exec" {
    command = "python3 ${path.module}/remediation_config.py add ${self.triggers.region}
  }

  provisioner "local-exec" {
    when    = destroy
    command = "python3 ${path.module}/remediation_config.py remove ${self.triggers.region}"
  }
}

Martin Atkins · Accepted Answer

Terraform internally tracks the relationships between the objects you describe in your configuration and the objects in the remote system using its "state", which is a data structure that is saved either to local disk or to a remote system to persist data between Terraform runs.

After you run Terraform for the first time, Terraform will save a state snapshot that includes two key types of information:

A copy of the values you set in the configuration, like the triggers value in your null_resource.
Identifiers and other data that were chosen by the remote system, such as the server-generated id of an aws_instance, or a hash of the SSM Document created by your aws_ssm_document resource.

Using that information, future runs of Terraform will perform the following additional actions that do not occur on the first run:

For any resource type that corresponds with data in a remote system, like aws_ssm_document, Terraform (more accurately: the AWS provider) will look up the object using the remote system API to make sure it still exists and, if so, to update the state data to match the current values in the remote object.
After that, Terraform will compare what you've written in the configuration with what is stored in the state to determine if any update or replace actions need to be taken in order to make the state and the remote objects match the configuration.

A key difference between a resource type like aws_ssm_document and one like null_resource is that an SSM document object in the state is just a proxy representing a "real" object in the AWS SSM API, while null_resource only exists in the Terraform state and has no corresponding upstream object. Therefore Terraform can notice that the SSM document it created no longer exists and proceed under that assumption (planning to create a new one), but it can make no such automatic determination about a null_resource instance.

For AWS in particular, when you change the region argument on the provider configuration you have effectively told the AWS provider to access an entirely different set of endpoints. If you initially created an SSM document in us-east-1, for example, then the state will record the details of that object. If you then change region to us-west-1, Terraform will ask the us-west-1 API endpoints if that object exists, and will be told that it does not because AWS SSM uses region-specific namespaces.

If you were to accept the plan to create a new SSM document in us-west-1 then Terraform would completely lose track of the original object in us-east-1, because as far as Terraform is concerned there is only one remote object per resource instance, and you've just told Terraform to look for and manage that object in a different AWS region, rather than to manage a new one in addition.

With all of that background context aside, the key problem here is that you can't just change the region argument in the AWS provider to duplicate objects in a new region, because Terraform is not tracking separate objects per region. Instead, you need to arrange for the objects in each region to be tracked as separate objects in Terraform.

There are several ways to accomplish that, depending on how this fits in to your overall system. One way is to write several different AWS provider configurations and write a separate set of resource blocks for each one:

provider "aws" {
  alias = "us-east-1"

  region = "us-east-1"
}

provider "aws" {
  alias = "us-west-1"

  region = "us-west-1"
}

resource "aws_ssm_document" "us-east-1" {
  # Attach the resource to the non-default (aliased) provider configuration
  provider = aws.us-east-1

  # other settings for the document in the us-east-1 region
}

resource "aws_ssm_document" "us-west-1" {
  # Attach the resource to the non-default (aliased) provider configuration
  provider = aws.us-west-1

  # other settings for the document in the us-west-1 region
}

If your goal is to have the same constellation of objects in each region, you can avoid the duplication by factoring the resource blocks out into a separate reusable module, and then call that module once for each region with a different instance of the AWS provider passed to each:

provider "aws" {
  alias = "us-east-1"

  region = "us-east-1"
}

provider "aws" {
  alias = "us-west-1"

  region = "us-west-1"
}

module "us-east-1" {
  source = "./modules/per-region"

  providers = {
    # The default (unaliased) "aws" provider configuration
    # in this instance of the module will be the us-east-1
    # configuration declared above.
    aws = aws.us-east-1
  }
}

module "us-west-1" {
  source = "./modules/per-region"

  providers = {
    # The default (unaliased) "aws" provider configuration
    # in this instance of the module will be the us-west-1
    # configuration declared above.
    aws = aws.us-west-1
  }
}

When using this pattern above, all of the resource blocks in the ./modules/per-region module should not have explicit provider arguments and will therefore be associated with that module's default aws provider configuration. The module blocks shown above then ensure that each instance of the module inherits a different aws provider configuration.

The key thing to know about both of the above approaches is that they will result in a single Terraform state containing a separate set of objects per region. If you took the first approach of just declaring duplicate resources inline then those objects would have addresses like this:

aws_ssm_document.us-east-1
aws_ssm_document.us-west-1
null_resource.us-east-1
null_resource.us-west-1

If you use the approach of having a shared module representing all of the common infrastructure for a region then they will instead be identified like this:

module.us-east-1.aws_ssm_document.example
module.us-east-1.null_resource.example
module.us-west-1.aws_ssm_document.example
module.us-west-1.null_resource.example

Either way though, Terraform will track all of the objects at separate addresses, tracking separate data in the state for each one. When you re-run Terraform after the initial creation, it will read the data about both SSM documents from the AWS API and it will check both of the null_resource objects separately to see if their triggers have changed.

A configuration like the two above does have an important implication though: any time you run Terraform you are reading data about and potentially applying changes to all objects across all regions. If a particular region is currently having an outage, you may end up blocked from applying changes to another region. If you are using regions as part of a for workload isolation strategy, it may be undesirable that any change made to that shared module must always be applied across all regions at once.

For that reason, there is another pattern that you can use which adds some additional workflow complexity but ensures that you work with each region separately: write a separate configuration per region, each of which includes only a provider block and a single module block calling the shared module for that region. For example, here's a configuration for just us-east-1:

provider "aws" {
  region = "us-east-1"
}

module "per_region" {
  source = "./modules/per-region"

  # This time we're just using the default provider
  # configuration throughout, so we don't need any
  # special provider configuration overrides.
}

Under this model, you'd work separately with each region. For example:

cd us-east-1
terraform apply
cd ../us-west-1
terraform apply

Rolling out the same change across multiple regions now has more steps, though you could choose to automate that for the common case where you do want to apply them all each time. Separating them is a workflow flexibility tradeoff though: you can now choose to work only with one region at a time when needed, either because you want to do a gradual rollout of a risky change or because (as noted above) one region is currently having an outage and you need to make changes to other regions to compensate.

An important detail about this multi-configuration approach is that each configuration will now have its own state too. This is a different way to arrive in the situation where the objects for each region are tracked as separate objects in Terraform: rather than namespacing them within a single Terraform state, we can instead track them in separate state snapshots altogether, achieving the same result of keeping them distinct but at a courser level of granularity.

There is no "magic bullet" answer, and so you'll need to consider the options yourself and decide which approach makes the most sense for your particular problem. Details aside, the main thing to keep in mind here is that re-running the same configuration with a changed AWS provider region is not the correct usage pattern except in some very specific unusual cases. You will always want to make sure that Terraform is tracking all of your objects separately in the state, even if objects are duplicated between regions.

This same approach applies to anything else in a provider configuration that controls which set of API endpoints Terraform is using. For AWS, using credentials that refer to a different AWS account will often lead to the same situation because many objects in AWS are per-account or per-account-per-region namespaced. Terraform cannot, in general, tell the difference between an object being deleted in the remote system vs. it now being configured to ask a different endpoint where the object does not exist.

Terraform detects unexpected changes when changing the AWS provider region

Answers (1)

Related Questions