sborpo
sborpo

Reputation: 948

Kubernetes Job Pods Become In "Unknown" State

I'm using the K3s distribution of Kubernetes which is deployed on a Spot EC2 Instance in AWS.

I have scheduled a certain processing job and sometimes this job is being terminated and becomes in "Unknown" state (the job code is abnormally terminated)

kubectl describe pod <pod_name>

it shows this:

 State:          Terminated
      Reason:       Unknown
      Exit Code:    255
      Started:      Wed, 06 Jan 2021 21:13:29 +0000
      Finished:     Wed, 06 Jan 2021 23:33:46 +0000

The AWS logs show that the CPU consumption was 99% right before the crash. From number of sources (1, 2, 3) I saw that this can be a reason of a node crash but didn't see that one, What may be the reason?

Thanks!

Upvotes: 3

Views: 2009

Answers (1)

Wytrzymały Wiktor
Wytrzymały Wiktor

Reputation: 13878

The actual state of the Job is Terminated with the Unknown reason. In order to debug this situation you need to get a relevant logs from Pods created by your Job.

When a Job completes, no more Pods are created, but the Pods are not deleted either. Keeping them around allows you to still view the logs of completed pods to check for errors, warnings, or other diagnostic output.

To do so, execute kubectl describe job $JOB to see the Pods' names under the Events section and than execute kubectl logs $POD.

If that won't be enough, you can try different ways to Debug Pods, such as:

  • Debugging with container exec

  • Debugging with an ephemeral debug container, or

  • Debugging via a shell on the node

The methods above will give you more info retarding the actual reasons behind the Job termination.

Upvotes: 2

Related Questions