170730350
170730350

Reputation: 622

Cilium pods stuck in Terminating state when running helm delete

I have cilium installed in my test cluster (AWS, with the AWS CNI deleted because we use the cilium CNI plugin) and whenever I delete the cilium namespace (or run helm delete), the hubble-ui pod gets stuck in terminating state. The pod has a couple of containers, but I notice that one container named backend exits with code 137 when the namespace is deleted, leaving the hubble-ui pod and the namespace that the pod is in, stuck in Terminating state. From what I am reading online, containers exit with 137 when they attempt to use more memory that they have been allocated. In my test cluster, no resource limits have been defined (spec.containers.[*].resources = {}) on the pod or namespace. There is no error message displayed as reason for the error. I am using the cilium helm package v1.12.3, but this issue has been going on even before we updated the helm package version.

I would like to know what is causing this issue as it is breaking my CI pipeline. How can I ensure a graceful exit of the backend container? (as opposed to clearing finalizers).

Upvotes: 0

Views: 969

Answers (1)

170730350
170730350

Reputation: 622

So it appears that there is a bug in the backend application/container for the hubble-ui service. Kubernetes sends a SIGTERM signal to the container and it fails to respond. I verified this by getting a shell into the container and sending SIGTERM and SIGINT, which is what the application seems to listen for in order to exit and it just doesn’t respond to either signal.

Next, I added a preStop hook that looks like below and the pod behaved itself

...
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "kill -SIGILL 1; true"]

Upvotes: 0

Related Questions