Reputation: 1
I am trying to figure out if there is a way to force a pod that is stuck on containerCreating
state (for valid reasons like can't mount an inaccessible NFS, etc.) to move to a failed state after a specific amount of time.
I have Kubernetes jobs
that I'm running through a Jenkins pipeline. I'm using the job state (type: completed|failed
) to determine the outcome and then I gather the results of the jobs (kubectl get pods
+ kubectl logs
). It works well as long as the pods go into a known failed state like ContainerCannotRun
or Backofflimit
and therefore the job
state goes to failed
.
Where the problem arises is when a pod goes into containerCreating
state and stays that way. Then, the job state stays active
and will never change. Is there a way, in the job
manifest to put something to force a pod that's in containerCreating
state to move to a failed state after a certain amount of time?
Example: pod status
- image: myimage
imageID: ""
lastState: {}
name: primary
ready: false
restartCount: 0
state:
waiting:
reason: ContainerCreating
hostIP: x.y.z.y
phase: Pending
qosClass: BestEffort
startTime: "2020-05-06T17:09:58Z"
job status
active: 1
startTime: "2020-05-06T17:09:58Z"
Thanks for any input.
Upvotes: 0
Views: 631
Reputation: 44559
As documented here use activeDeadlineSeconds
or backoffLimit
The activeDeadlineSeconds
applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds
, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.
Once backoffLimit
has been reached the Job will be marked as failed and any running Pods will be terminated.
Note that a Job’s activeDeadlineSeconds
takes precedence over its backoffLimit
. Therefore, a Job that is retrying one or more failed Pods will not deploy additional Pods once it reaches the time limit specified by activeDeadlineSeconds
, even if the backoffLimit
is not yet reached.
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-timeout
spec:
backoffLimit: 5
activeDeadlineSeconds: 100
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
Upvotes: 1