Reputation: 106
I've been working with a 6 node cluster for the last few weeks without issue. Earlier today we ran into an open file issue (https://github.com/kubernetes/kubernetes/pull/12443/files) and I patched and restarted kube-proxy.
Since then, all rc deployed pods to ALL BUT node-01 get stuck in pending state and there log messages stating the cause.
Looking at the docker daemon on the nodes, the containers in the pod are actually running and a delete of the rc removes them. It appears to be some sort of callback issue between the state according to kubelet and the kube-apiserver.
Cluster is running v1.0.3
Here's an example of the state
docker run --rm -it lachie83/kubectl:prod get pods --namespace=kube-system -o wide
NAME READY STATUS RESTARTS AGE NODE
kube-dns-v8-i0yac 0/4 Pending 0 4s 10.1.1.35
kube-dns-v8-jti2e 0/4 Pending 0 4s 10.1.1.34
get events
Wed, 16 Sep 2015 06:25:42 +0000 Wed, 16 Sep 2015 06:25:42 +0000 1 kube-dns-v8 ReplicationController successfulCreate {replication-controller } Created pod: kube-dns-v8-i0yac
Wed, 16 Sep 2015 06:25:42 +0000 Wed, 16 Sep 2015 06:25:42 +0000 1 kube-dns-v8-i0yac Pod scheduled {scheduler } Successfully assigned kube-dns-v8-i0yac to 10.1.1.35
Wed, 16 Sep 2015 06:25:42 +0000 Wed, 16 Sep 2015 06:25:42 +0000 1 kube-dns-v8-jti2e Pod scheduled {scheduler } Successfully assigned kube-dns-v8-jti2e to 10.1.1.34
Wed, 16 Sep 2015 06:25:42 +0000 Wed, 16 Sep 2015 06:25:42 +0000 1 kube-dns-v8 ReplicationController successfulCreate {replication-controller } Created pod: kube-dns-v8-jti2e
scheduler log
I0916 06:25:42.897814 10076 event.go:203] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-dns-v8-jti2e", UID:"c1cafebe-5c3b-11e5-b3c4-020443b6797d", APIVersion:"v1", ResourceVersion:"670117", FieldPath:""}): reason: 'scheduled' Successfully assigned kube-dns-v8-jti2e to 10.1.1.34
I0916 06:25:42.904195 10076 event.go:203] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-dns-v8-i0yac", UID:"c1cafc69-5c3b-11e5-b3c4-020443b6797d", APIVersion:"v1", ResourceVersion:"670118", FieldPath:""}): reason: 'scheduled' Successfully assigned kube-dns-v8-i0yac to 10.1.1.35
tailing kubelet log file during pod create
tail -f kubelet.kube-node-03.root.log.INFO.20150916-060744.10668
I0916 06:25:04.448916 10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:25:24.449253 10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:25:44.449522 10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:26:04.449774 10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:26:24.450400 10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:26:44.450995 10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:27:04.451501 10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:27:24.451910 10668 config.go:253] Setting pods for source file : {[] 0 file}
I0916 06:27:44.452511 10668 config.go:253] Setting pods for source file : {[] 0 file}
kubelet process
root@kube-node-03:/var/log/kubernetes# ps -ef | grep kubelet
root 10668 1 1 06:07 ? 00:00:13 /opt/bin/kubelet --address=10.1.1.34 --port=10250 --hostname_override=10.1.1.34 --api_servers=https://kube-master-01.sj.lithium.com:6443 --logtostderr=false --log_dir=/var/log/kubernetes --cluster_dns=10.1.2.53 --config=/etc/kubelet/conf --cluster_domain=prod-kube-sjc1-1.internal --v=4 --tls-cert-file=/etc/kubelet/certs/kubelet.pem --tls-private-key-file=/etc/kubelet/certs/kubelet-key.pem
node list
docker run --rm -it lachie83/kubectl:prod get nodes
NAME LABELS STATUS
10.1.1.30 kubernetes.io/hostname=10.1.1.30,name=node-1 Ready
10.1.1.32 kubernetes.io/hostname=10.1.1.32,name=node-2 Ready
10.1.1.34 kubernetes.io/hostname=10.1.1.34,name=node-3 Ready
10.1.1.35 kubernetes.io/hostname=10.1.1.35,name=node-4 Ready
10.1.1.42 kubernetes.io/hostname=10.1.1.42,name=node-5 Ready
10.1.1.43 kubernetes.io/hostname=10.1.1.43,name=node-6 Ready
Upvotes: 3
Views: 2130
Reputation: 106
The issue turned out to be an MTU issue between the node and the master. Once that was fixed the problem was resolved.
Upvotes: 3
Reputation: 81
Looks like you were building your cluster from scratch. Have you run conformance test against your cluster yet? If no, could you please run it and the detail information can be found at:
The conformance test should failed, or at least give us more information on your cluster setup. Please post the test result somewhere, so that we can diagnose your problem more.
The problem most likely your kubelet and your kube-apiserver don't agree upon the node name here. And I also noticed that you are using hostname_override too.
Upvotes: 0