user630702
user630702

Reputation: 3167

Prevent GCP maintenance from restarting GKE cluster

Seems like every week the GKE cluster gets restarted. Is there anything I could do to prevent that from happening? It does migrate pods to other node while it does maintenance on one of the node. But I'm not sure if there is downtime during migration and also sometimes the pods gets stuck in crash crashloopbackoff or errimagepull state.

How does the migration happen while maintenance? Does it create a new pod and then route the traffic and then delete the old pod when the total number of replica is just one? Just wanted to know if there is downtime. Its a new cluster and monitoring hasn't been setup so don't know if players are experiencing downtime during maintenance.

Is there a way to prevent GCP from doing maintenance? I used terraform to create the cluster so if I could prevent it I need to do it via terraform since GKE nodes can't be edited using GCP console.

Upvotes: 1

Views: 1723

Answers (2)

ppp0
ppp0

Reputation: 1

I had a similar issue where nodes would get randomly restarted. Turned out I had to deploy a new node pool with Compact placement set to off.

Compact placement does not allow for live migration of nodes in case of a maintenance event. Instead, the node in question will be terminated and restarted.

Upvotes: 0

dustinmoris
dustinmoris

Reputation: 3361

You can configure your maintenance windows and enable/disable automatic node upgrades.

Here's an example of the configuration options in the GCP console:

enter image description here

You can also decide on which release channel you want to be (rapid, regular and stable).

Your Kubernetes control plane will have downtime if you have a zonal cluster. Only regional clusters replicate the control plane.

In terms of your own applications they should have zero downtime and GKE will automatically create new nodes and divert traffic when pods are ready to receive traffic.

Upvotes: 4

Related Questions