In Kubernetes, how can I safely tear down Long Running Processes?

Question

I have a system where some long running tasks are done by processing messages from a message queue. The actual tasks are doing some significant processing on large videos.

Here is the problem in the following steps:

Process in pod takes message off queue, start processing video, this takes minutes.
Developer makes change, releases, and a Kubernetes Deployment starts.
During the deployment, the long running process gets killed and replaced by new node, which loses all work.

Is there a mechanism to work around this in Kubernetes? Some kind of check to ensure that the worker in the pod is in a state that it can be destroyed safely? Almost something like a destroyProbe (the opposite of a readinessProbe)

In Kubernetes, how can I safely tear down Long Running Processes?

Answers (1)

Related Questions