Reputation: 1417
I'm wondering the graceful way to reduce nodes in a Kubernetes cluster on GKE.
I have some nodes each of which has some pods watching a shared job queue and executing a job. I also have the script which monitors the length of the job queue and increase the number of instances when the length exceeds a threshold by executing gcloud compute instance-groups managed resize
command and it works ok.
But I don't know the graceful way to reduce the number of instances when the length falls below the threshold.
Is there any good way to stop the pods working on the terminating instance before the instance gets terminated? or any other good practice?
Note
Upvotes: 1
Views: 1228
Reputation: 155
Before resizing the cluster, let's set the project context in the cloud shell by running the below commands:
gcloud config set project [PROJECT_ID]
gcloud config set compute/zone [COMPUTE_ZONE]
gcloud config set compute/region [COMPUTE_REGION]
gcloud components update
Note: You can also set project, compute zone & region as flags in the below command using --project, --zone, and --region operational flags
gcloud container clusters resize [CLUSTER_NAME] --node-pool [POOL_NAME] --num-nodes [NUM_NODES]
Run the above command for each node pool. You can omit the --node-pool flag if you have only one node pool.
Reference: https://cloud.google.com/kubernetes-engine/docs/how-to/resizing-a-cluster
Upvotes: 1
Reputation: 80
I think the best approach is instead of using a pod to run your tasks, use the kubernetes job object. That way when the task is completed the job terminates the container. You would only need a small pod that could initiate kubernetes jobs based on the queue.
The more kube jobs that get created, the more resources will be consumed and the cluster auto-scaler will see that it needs to add more nodes. A kube job will need to complete even if it gets terminated, it will get re-scheduled to complete.
There is no direct information in the GKE docs about whether a downsize will happen if a Job is running on the node, but the stipulation seems to be if a pod can be easily moved to another node and the resources are under-utilized it will drain the node.
Refrences
Upvotes: 1