How can I configure Terraform to update a GCP compute engine instance template without destroying and re-creating?

Question

I have a service deployed on GCP compute engine. It consists of a compute engine instance template, instance group, instance group manager, and load balancer + associated forwarding rules etc.

We're forced into using compute engine rather than Cloud Run or some other serverless offering due to the need for docker-in-docker for the service in question.

The deployment is managed by terraform. I have a config that looks something like this:

data "google_compute_image" "debian_image" {
  family  = "debian-11"
  project = "debian-cloud"
}

resource "google_compute_instance_template" "my_service_template" {
  name         = "my_service"
  machine_type = "n1-standard-1"

  disk {
    source_image = data.google_compute_image.debian_image.self_link
    auto_delete  = true
    boot         = true
  }
  ...
  metadata_startup_script = data.local_file.startup_script.content
  metadata = {
    MY_ENV_VAR = var.whatever
  }
}

resource "google_compute_region_instance_group_manager" "my_service_mig" {
  version {
    instance_template = google_compute_instance_template.my_service_template.id
    name              = "primary"
  }

  ...
}

resource "google_compute_region_backend_service" "my_service_backend" {
  ...

  backend {
    group = google_compute_region_instance_group_manager.my_service_mig.instance_group
  }
}

resource "google_compute_forwarding_rule" "my_service_frontend" {
  depends_on = [
    google_compute_region_instance_group_manager.my_service_mig,
  ]
  name = "my_service_ilb"

  backend_service       = google_compute_region_backend_service.my_service_backend.id
  
  ...
}

I'm running into issues where Terraform is unable to perform any kind of update to this service without running into conflicts. It seems that instance templates are immutable in GCP, and doing anything like updating the startup script, adding an env var, or similar forces it to be deleted and re-created.

Terraform prints info like this in that situation:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.connectors_compute_engine.google_compute_instance_template.airbyte_translation_instance1 must be replaced
-/+ resource "google_compute_instance_template" "my_service_template" {
      ~ id                      = "projects/project/..." -> (known after apply)
      ~ metadata                = { # forces replacement
          + "TEST"                             = "test"
            # (1 unchanged element hidden)
        }

The only solution I've found for getting out of this situation is to entirely delete the entire service and all associated entities from the load balancer down to the instance template and re-create them.

Is there some way to avoid this situation so that I'm able to change the instance template without having to manually update all the terraform config two times? At this point I'm even fine if it ends up creating some downtime for the service in question rather than a full rolling update or something since that's what's happening now anyway.

aameti · Accepted Answer

I was triggered by this issue as well.

However, according to:

https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance_template#using-with-instance-group-manager

Instance Templates cannot be updated after creation with the Google Cloud Platform API. In order to update an Instance Template, Terraform will destroy the existing resource and create a replacement. In order to effectively use an Instance Template resource with an Instance Group Manager resource, it's recommended to specify create_before_destroy in a lifecycle block. Either omit the Instance Template name attribute, or specify a partial name with name_prefix.

I would also test and plan with this lifecycle meta argument as well:

+ lifecycle {
+   prevent_destroy = true
+ }
}

Or more realistically in your specific case, something like:

    resource "google_compute_instance_template" "my_service_template" {
      version {
        instance_template = google_compute_instance_template.my_service_template.id
        name              = "primary"
      }

+ lifecycle {
+   create_before_destroy = true
+ }
}

So terraform plan with either create_before_destroy or prevent_destroy = true before terraform apply on google_compute_instance_template to see results.

Ultimately, you can remove google_compute_instance_template.my_service_template.id from state file and import it back.

Some suggested workarounds in this thread:

terraform lifecycle prevent destroy

How can I configure Terraform to update a GCP compute engine instance template without destroying and re-creating?

Answers (1)

Related Questions