Quyen Nguyen Tuan
Quyen Nguyen Tuan

Reputation: 1675

Kubernetes pod is reported as running while it is not

I'm getting a weird bug: The pod is reported as running via k8s api-server. But the container that runnig the application was actually exited, only the pause container gcr.io/google_containers/pause:0.8.0 is running, not the actual container.

$ docker ps -a | grep ms-issue
1754ddbbfbd8        agencyrev/workflow.microservice.issue:v0.0.9                          "npm start"            2 days ago          Exited (1) 11 hours ago                       k8s_workflow-microservice-issue.458c077c_rc--ms-issue--v0.0.9-btryt_staging_18d44bae-dac7-11e5-889c-00155d08db02_965dee2f
30c0addd88ef        gcr.io/google_containers/pause:0.8.0                                  "/pause"               2 days ago          Up 2 days                                     k8s_POD.b5de0404_rc--ms-issue--v0.0.9-btryt_staging_18d44bae-dac7-11e5-889c-00155d08db02_e427af83

As you can see, the app container was exited 11 hours ago, but the /pause::0.8.0 is still running, that why it is reported as running. I noticed this issue because I kept getting error Dial failed: connection refused in the kube-proxy. And not just this pod, I got some other pods (same host) that ran into this as well.

I don't know what caused it, but is that possible? And how?

I'm using kubernetes version v1.1.7

$ kubetctl version
Client Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.7", GitCommit:"e4e6878293a339e4087dae684647c9e53f1cf9f0", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.7", GitCommit:"e4e6878293a339e4087dae684647c9e53f1cf9f0", GitTreeState:"clean"}

$ docker version
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 2c2c52b-dirty
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 2c2c52b-dirty
OS/Arch (server): linux/amd64

$ uname -a
Linux dev-coreos-k8s_14 4.1.5-coreos #2 SMP Thu Aug 13 09:18:45 UTC 2015 x86_64 Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz GenuineIntel GNU/Linux

The issue above leads to other issue, that I cannot stop the pod without --grace-period=0 option (the status was always at Terminating with default grace-period 30s). And even if after the pod was stopped, the pause container is still there. I had to stop it with docker stop

Upvotes: 1

Views: 485

Answers (2)

Christian Grabowski
Christian Grabowski

Reputation: 2882

Both Kubernetes and the Docker daemon will report the Pod/container (there is a difference) running if PID one is running in the container, or if PID one of all containers in the Pod are running. So you can have something such as supervisord, a shell script or another user-space init system running first that then spawns more processes, or anything spawning additional processes. The lifecycle of both Pods and Containers is indicated by PID 1, so the --grace-period=0 is killing PID 1 immediately, otherwise when you go to do a kill, it actually first sends a SIG_TERM which most likely, PID 1 is reacts to but keeps it running.

Upvotes: 0

cloudnoob
cloudnoob

Reputation: 85

This seems to be specific to that pod/image you are running. Can you check the logs and see why that pod exited ? Can you try any other image from docker hub ?

Upvotes: 1

Related Questions