Know when a Pod was killed after exceeding its termination grace period

Question

The scenario is as follows:

Our pods have a terminationGracePeriodSeconds of 60, which gives them ~60 seconds to do any necessary cleanup before Kubernetes will kill them ungracefully. In the majority of cases the cleanup happens well within the 60 seconds. But every now and then we (manually) observe pods that didn't complete their gracefully termination and were killed by Kubernetes.

How does one monitor these situations? When I try replicating this scenario with a simple linux image and sleep, I don't see Kubernetes logging an additional event after the "Killed" one. Without an additional event this is impossible to monitor from the infrastructure side.

Tarum · Accepted Answer

You can use container hooks and then you can monitor those hooks events. For example preStop hook which is called when a POD get destroyed, will fire FailedPreStopHook event if it can not complete its work until terminationGracePeriodSeconds

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  containers:
  - name: lifecycle-demo-container
    image: nginx
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
      preStop:
        exec:
          command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]

https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination

https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/

Know when a Pod was killed after exceeding its termination grace period

Answers (1)

Related Questions