Rubber Duck
Rubber Duck

Reputation: 3743

Why has Terraform stopped working on GCP project

I had a terraform deployment that deployed GKE cluster pools on GCP and it stopped working.

Error: Error applying plan:

1 error(s) occurred:

* google_container_cluster.primary: 1 error(s) occurred:

* google_container_cluster.primary: Post 
https://container.googleapis.com/v1/projects/...-gcp-poc/zones/europe-
west1-d/clusters?alt=json: dial tcp: i/o timeout

I can still deploy manually via console

I can still deploy it with gcloud cli

gcloud container clusters create cluster_name --zone europe-west1-b

I tried changing the credentials json file to no avail.

It happened after an upgrade from google plugin 1.4 to 1.5 My mac was restarted since.

Upvotes: 0

Views: 1648

Answers (2)

Clemens
Clemens

Reputation: 973

In my case I got the error (Error: Failed to create deployment: Post https://32.244.226.151/apis/apps/v1/namespaces/default/deployments: dial tcp 35.242.229.150:443: i/o timeout) when I tried to create a deployment for a cluster which I had just created (via terraform).

What solved the problem for me was to reconnect kubctl to the cluster:

gcloud container clusters list
gcloud container clusters get-credentials PUT_CLUSTER_NAME_HERE

UPDATE: I added this:

provider "kubernetes" {
   host     = "${google_container_cluster.primary.endpoint}"
   client_certificate     = "${base64decode(google_container_cluster.primary.master_auth.0.client_certificate)}"
   client_key             = "${base64decode(google_container_cluster.primary.master_auth.0.client_key)}"
   cluster_ca_certificate = "${base64decode(google_container_cluster.primary.master_auth.0.cluster_ca_certificate)}"
}

and

/**
 * Submit the job - Terraform doesn't yet support StatefulSets, so we have to
 * shell out.
 * See: https://github.com/sethvargo/vault-on-gke/blob/master/terraform/gcp.tf
 */
resource "null_resource" "apply" {

  depends_on = ["google_container_node_pool.primary_preemptible_nodes"]
    provisioner "local-exec" {
    command = <<EOF
gcloud container clusters get-credentials "${google_container_cluster.primary.name}" \
  --project="${google_container_cluster.primary.project}"

gcloud container clusters list
EOF
  }
}

which solved the issue completely for me. Note: my cluster resource is resource "google_container_cluster" "primary" { ... }

Upvotes: 0

Rubber Duck
Rubber Duck

Reputation: 3743

I ended up deleting the .terraform folder and replacing it with an old one I had with google plugin 1.4

terraform init
terraform plan
terraform apply

This worked even though I got this error:

Error: Error applying plan:

1 error(s) occurred:

* google_container_cluster.rtp_container_cluster: 1 error(s) occurred:

* google_container_cluster.rtp_container_cluster: Error reading 
instance group manager returned as an instance group URL: Get 
https://www.googleapis.com/compute/v1/projects/rtp-gcp-
poc/zones/europe-west1-b/instanceGroupManagers/gke-rtp-container-
cluste-default-pool-8bb9aa85-grp?alt=json: dial tcp: i/o timeout

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

then I connected via kubectl

➜  ~ kubectl get node
NAME                                                  STATUS    ROLES     
AGE       VERSION
gke-...-container-cluste-default-pool-8bb9aa85-7kcb   Ready     <none>    
14m       v1.8.6-gke.0

I tried

terraform apply

again and presto the deployment finished.

Since I have excellent connectivity this smels like a google plugin bug to me.

Upvotes: 0

Related Questions