Reputation: 17486
I want to run a set of long-running tasks in batch using Horizontal Pod Autoscaler. These tasks can take few minutes or few hours to run in some cases, and always use 80~100% available CPU resources.
I want to understand Autoscaler's behavior when it decides it is time to scale down the fleet.
Is there a way to prevent this from happening by prioritizing pods with the lowest CPU utilization to be selected for scale down first? That way, those pods which are processing works will be left untouched.
Upvotes: 4
Views: 1608
Reputation: 13260
I am not aware of a way to customize which replicas in a deployment should be deleted when scaling down the number of replicas.
Maybe you can solve your problem by setting terminationGracePeriodSeconds
and using the preStop
hook.
With terminationGracePeriodSeconds
you can specify how long the containers in a pod will wait between when the first SIGTERM
signal is sent and the SIGKILL
signal is sent. This is suboptimal for you because AFAIU you don't know how long it will take to the pod to complete the assigned tasks.
But if you set this value high enough, you can leverage the preStop
hook as well. From the documentation:
PreStop is called immediately before a container is terminated due to an API request or management event such as liveness/startup probe failure, preemption, resource contention, etc. The handler is not called if the container crashes or exits. The reason for termination is passed to the handler. The Pod's termination grace period countdown begins before the PreStop hooked is executed. Regardless of the outcome of the handler, the container will eventually terminate within the Pod's termination grace period. Other management of the container blocks until the hook completes or until the termination grace period is reached.
If you are able from within the container to run a command that "blocks" until the container is finished working then you should be able to make it terminate only when it's idle.
Let me also link a nice blog post explaining how the whole thing works: https://pracucci.com/graceful-shutdown-of-kubernetes-pods.html
Upvotes: 4