Reputation: 1523

How do I get a kubernetes cronjob to retry?

I have cronjob where the pod it starts ends up in ImagePullBackOff and the cronjob never scheduled another pod , though it has to per schedule. Is there a way to force the cron controller to schedule another pod even though the previous one ended in ImagePullBackOff.

I don't want multiple pods running at the same time so use concurrencyPolicy: Forbid , Is there anyway to get CronJob to still schedule another pod ?

Upvotes: 3

Answers (2)

danielorn

Reputation: 6147

You don't really want the scheduler to schedule another pod. Doing that would lead to a resource leak as mentioned explained in Infinite ImagePullBackOff CronJob results in resource leak, which @VonC mentioned in his answer.

Instead you should focus on fixing the root cause to why the pod is in ImagePullBackOff. Once that is done Kubernetes will automatically pull the image, run the pod and a new one will be scheduled once the cron schedule is fullfilled.

ImagePullBackoff means that the container could not start because the image could not be retrieved. The reason could be for example an invalid image id or tag, a missing or invalid imagePullSecret or network connectivity issues.

When a pod is in ImagePullBackoff kubernetes will periodically retry to pull the image, and once the image is successfully pulled the pod starts.

The delay between pull attempts will increase with each attempt (a BackOff), as explained in the docs

Kubernetes raises the delay between each attempt until it reaches a compiled-in limit, which is 300 seconds (5 minutes).

Upvotes: 2

VonC

Reputation: 1324278

Using concurrencyPolicy: Forbid is one of the workarounds to that "feature" (reschedule a pod after a ImagePulledBackof).

See kubernetes/kubernetes issue 76570, which illustrates a drawback of said feature:

What happened:

A CronJob without a ConcurrencyPolicy or history limit that uses an image that doesn't exist will slowly consume almost all cluster resources.
In our cluster we started hitting the pod limit on all of our nodes, and began losing our ability to schedule new pods.

What you expected to happen:

Even without a ConcurrencyPolicy, CronJob should probably have the same behavior as most of the other pod schedulers.
If I try to start a deployment with X replicas and I get ImagePullBackOff on one of the containers in a pod, the deployment won't keep trying to schedule more pods on different nodes until it consumes all cluster resources.

This is especially bad with CronJob, because unlike Deployment where an upper limit for horizontal scalability has to be set, CronJob with no history limit and ConcurrencyPolicy will slowly consume all resources on a cluster.

While this is up for debate, I would personally say when a scheduled Job has the ImagePullBackOff error, it shouldn't try to keep scheduling new pods. It should probably kill the pod trying to pull an image and make a new one, or wait for the pod to successfully pull the image.

Upvotes: 1

How do I get a kubernetes cronjob to retry?

Answers (2)

What happened:

What you expected to happen:

Related Questions