MaatDeamon
MaatDeamon

Reputation: 9761

Failing a Kubernetes Deployment

I wonder if there is a way to have a deployment stop recreating new pods, when those failed multiple times. In other, given that we can't for instance have a restartPolicy never in a pod template of a Deployment, i am wondering, how can i consider a service failed and have in a stopped state.

We have a use case, where imperatively need to have kubernetes interrupt a deployment that have all his pods constantly failing.

Upvotes: 1

Views: 958

Answers (2)

Mark McLaren
Mark McLaren

Reputation: 11540

In my case I had a deployment that was in a fail and restart loop where the pod and its logs didn't stay around long enough for me to work out what had gone wrong.

As a workaround I temporarily changed the start up command so that even if the intended start up command failed the pod would be kept alive. This allowed me to review the logs and remote into the container to work out what the issue was (I was using Kubernetes dashboard to view logs and remote into the machine).

Here is a simplistic example, imagine your deployment contains something like this (only key parts shown).

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
       containers:
        - name: ...
          image: ...
          imagePullPolicy: ...
          ports:
            - name: http
              containerPort: ...
              protocol: TCP
          command:     
            - '/bin/sh'
            - '-c'
            - "(echo 'Starting up...' && exit 1 && tail -f /dev/null) || (echo 'Startup failed.' && tail -f /dev/null)"
          ....

I have bash shell installed in my docker container. What happens here is it attempts to do everything in the brackets before the double pipe "||" and if that fails it will run everything after the double pipe. So in the example case, "Starting up" will display, it will immediately "exit 1" which causes commands after the "||" to be run - "Startup failed." and a command to keep the container running. I can then review the logs and remote in to run additional checks.

Upvotes: 1

Frank Yucheng Gu
Frank Yucheng Gu

Reputation: 1889

Consider using a type "Job" instead of Deployment. According to the docs:

Use a Job for Pods that are expected to terminate, for example, batch computations. Jobs are appropriate only for Pods with restartPolicy equal to OnFailure or Never.

Hope this helps!

Upvotes: 1

Related Questions