Reputation: 1523
I have cronjob where the pod it starts ends up in ImagePullBackOff
and the cronjob never scheduled another pod , though it has to per schedule. Is there a way to force the cron controller to schedule another pod even though the previous one ended in ImagePullBackOff.
I don't want multiple pods running at the same time so use concurrencyPolicy: Forbid
, Is there anyway to get CronJob to still schedule another pod ?
Upvotes: 3
Views: 1755
Reputation: 6147
You don't really want the scheduler to schedule another pod. Doing that would lead to a resource leak as mentioned explained in Infinite ImagePullBackOff CronJob results in resource leak, which @VonC mentioned in his answer.
Instead you should focus on fixing the root cause to why the pod is in ImagePullBackOff
. Once that is done Kubernetes will automatically pull the image, run the pod and a new one will be scheduled once the cron schedule is fullfilled.
ImagePullBackoff
means that the container could not start because the image could not be retrieved. The reason could be for example an invalid image id or tag, a missing or invalid imagePullSecret
or network connectivity issues.
When a pod is in ImagePullBackoff
kubernetes will periodically retry to pull the image, and once the image is successfully pulled the pod starts.
The delay between pull attempts will increase with each attempt (a BackOff), as explained in the docs
Kubernetes raises the delay between each attempt until it reaches a compiled-in limit, which is 300 seconds (5 minutes).
Upvotes: 2
Reputation: 1324278
Using concurrencyPolicy: Forbid
is one of the workarounds to that "feature" (reschedule a pod after a ImagePulledBackof).
See kubernetes/kubernetes issue 76570, which illustrates a drawback of said feature:
What happened:
A
CronJob
without aConcurrencyPolicy
or history limit that uses an image that doesn't exist will slowly consume almost all cluster resources.
In our cluster we started hitting the pod limit on all of our nodes, and began losing our ability to schedule new pods.
What you expected to happen:
Even without a
ConcurrencyPolicy
,CronJob
should probably have the same behavior as most of the other pod schedulers.
If I try to start a deployment with X replicas and I getImagePullBackOff
on one of the containers in a pod, the deployment won't keep trying to schedule more pods on different nodes until it consumes all cluster resources.This is especially bad with
CronJob
, because unlikeDeployment
where an upper limit for horizontal scalability has to be set,CronJob
with no history limit andConcurrencyPolicy
will slowly consume all resources on a cluster.While this is up for debate, I would personally say when a scheduled Job has the
ImagePullBackOff
error, it shouldn't try to keep scheduling new pods. It should probably kill the pod trying to pull an image and make a new one, or wait for the pod to successfully pull the image.
Upvotes: 1