Angivare
Angivare

Reputation: 476

Terraform forces AKS node pool replacement without any changes

I have the following resource definition for additional node pools in my k8s cluster:

resource "azurerm_kubernetes_cluster_node_pool" "extra" {
  for_each = var.node_pools

  kubernetes_cluster_id   = azurerm_kubernetes_cluster.k8s.id
  name                    = each.key
  vm_size                 = each.value["vm_size"]
  node_count              = each.value["count"]
  node_labels             = each.value["labels"]
  vnet_subnet_id          = var.subnet.id
}

Here is the output from terraform plan:

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply":

  # module.aks.azurerm_kubernetes_cluster_node_pool.extra["general"] has been changed
  ~ resource "azurerm_kubernetes_cluster_node_pool" "extra" {
      + availability_zones     = []
        id                     = "/subscriptions/3913c9fe-c571-4af9-bc9a-533202d41061/resourcegroups/amic-resources/providers/Microsoft.ContainerService/managedClusters/amic-k8s-01/agentPools/general"
        name                   = "general"
      + node_taints            = []
      + tags                   = {}
        # (18 unchanged attributes hidden)
    }

Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or respond to these changes.

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.aks.azurerm_kubernetes_cluster_node_pool.extra["general"] must be replaced
-/+ resource "azurerm_kubernetes_cluster_node_pool" "extra" {
      - availability_zones     = [] -> null
      - enable_auto_scaling    = false -> null
      - enable_host_encryption = false -> null
      - enable_node_public_ip  = false -> null
      ~ id                     = "/subscriptions/3913c9fe-c571-4af9-bc9a-533202d41061/resourcegroups/amic-resources/providers/Microsoft.ContainerService/managedClusters/amic-k8s-01/agentPools/general" -> (known after apply)
      ~ kubernetes_cluster_id  = "/subscriptions/3913c9fe-c571-4af9-bc9a-533202d41061/resourcegroups/amic-resources/providers/Microsoft.ContainerService/managedClusters/amic-k8s-01" -> "/subscriptions/3913c9fe-c571-4af9-bc9a-533202d41061/resourceGroups/amic-resources/providers/Microsoft.ContainerService/managedClusters/amic-k8s-01" # forces replacement
      - max_count              = 0 -> null
      ~ max_pods               = 30 -> (known after apply)
      - min_count              = 0 -> null
        name                   = "general"
      - node_taints            = [] -> null
      ~ orchestrator_version   = "1.20.7" -> (known after apply)
      ~ os_disk_size_gb        = 128 -> (known after apply)
      - tags                   = {} -> null
        # (9 unchanged attributes hidden)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

As you can see, terraform tries to force replacement of my node pool because of a change in kubernetes_cluster_id, even though there is actually no change at all on this value. I've been able to work around this by ignoring kubernetes_cluster_id changes in the lifecycle block, but I am still puzzled as to why terraform detects a change there.

So why does Terraform find a change in this case while there are none?

Upvotes: 5

Views: 1915

Answers (2)

Marcel Šerý
Marcel Šerý

Reputation: 316

I have fixed this weird bug by introducing lifecycle block as follows:

resource "azurerm_kubernetes_cluster_node_pool" "my-node-pool" {
  name = "mynodepool"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id

  ...

  lifecycle {
    ignore_changes = [
      kubernetes_cluster_id
    ]
  }
} 

Not the cleanest way, but it works. Cluster Id should not be changed unless you recreate whole AKS Cluster, so it should be safe.

Upvotes: 1

Alin Valentin
Alin Valentin

Reputation: 547

I'm not proud, but I managed to work my way around this bug by using the "replace" terraform string function.

resource "azurerm_kubernetes_cluster_node_pool" "extra" {
  [...]
  # Use this once the bug gets fixed in the provider, and delete the workaround.
  # kubernetes_cluster_id   = azurerm_kubernetes_cluster.k8s.id 
  kubernetes_cluster_id   = replace(azurerm_kubernetes_cluster.k8s.id, "resourceGroups", "resourcegroups")

  [...]
}

NOTE: I'm not replacing /resourceGroups/ with /resourcegroups/ because the replace function will default to consider this is a regex replacement, which might end up duplicating your forward slashes. (I didn't test this myself)

Upvotes: 2

Related Questions