How will a scheduled (rolling) restart of a service be affected by an ongoing upgrade (and vice versa)

Question

Due to a memory leak in one of our services I am planning to add a k8s CronJob to schedule a periodic restart of the leaking service. Right now we do not have the resources to look into the mem leak properly, so we need a temporary solution to quickly minimize the issues caused by the leak. It will be a rolling restart, as outlined here:

How to schedule pods restart

I have already tested this in our test cluster, and it seems to work as expected. The service has 2 replicas in test, and 3 in production.

My plan is to schedule the CronJob to run every 2 hours.

I am now wondering: How will the new CronJob behave if it should happen to execute while a service upgrade is already running? We do rolling upgrades to achieve zero downtime, and we sometimes roll out upgrades several times a day. I don't want to limit the people who deploy upgrades by saying "please ensure you never deploy near to 08:00, 10:00, 12:00 etc". That will never work in the long term.

And vice versa, I am also wondering what will happen if an upgrade is started while the CronJob is already running and the pods are restarting.

Does kubernetes have something built-in to handle this kind of conflict?

David Maze · Accepted Answer

This answer to the linked question recommends using kubectl rollout restart from a CronJob pod. That command internally works by adding an annotation to the deployment's pod spec; since the pod spec is different, it triggers a new rolling upgrade of the deployment.

Say you're running an ordinary redeployment; that will change the image: setting in the pod spec. At about the same time, the kubectl rollout restart happens that changes an annotation setting in the pod spec. The Kubernetes API forces these two changes to be serialized, so the final deployment object will always have both changes in it.

This question then reduces to "what happens if a deployment changes and needs to trigger a redeployment, while a redeployment is already running?" The Deployment documentation covers this case: it will start deploying new pods on the newest version of the pod spec and treat all older ones as "old", so a pod with the intermediate state might only exist for a couple of minutes before getting replaced.

In short: this should work consistently and you shouldn't need to take any special precautions.

How will a scheduled (rolling) restart of a service be affected by an ongoing upgrade (and vice versa)

Answers (1)

Related Questions