Reputation: 10303
I have a two node k8s cluster working. I added another node to the cluster and the sudo kubeadm join ...
command reported that the node had joined the cluster. The new node is stuck in the NotReady state:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
msi-ubuntu18 NotReady <none> 29m v1.19.0
tv Ready master 131d v1.18.6
ubuntu-18-extssd Ready <none> 131d v1.17.4
The journalctl -u kubelet
shows this error:
Started kubelet: The Kubernetes Node Agent.
22039 server.go:198] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/l...
But the file /var/lib/kubelet/config.yaml exists and looks OK.
The sudo systemctl status kubelet
shows a different error:
kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plu
cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
And there is no /etc/cni/ directory on the new node. (The existing node has /etc/cni/net.d/ with calico files in it.) If I run
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
on the master again it doesn't solve the problem. There is still no /etc/cni/ dir on the new node.
I must have missed a step when creating the new node. How do I get the /etc/cni/ directory on the new node? It's also puzzling that the kubeadm join ...
command indicates success when the new node is stuck in NotReady.
Upvotes: 1
Views: 5451
Reputation: 1
I just ran through a similar situation, but the proximate cause was at a higher level.
Basically I applied some Gatekeeper security policies to the kube-system
namespace without recognizing I'd have to make exceptions for kube-proxy
and aws-node
(this was in EKS).
A couple examples from the kube event logs:
[denied by psp-pods-allowed-user-ranges] Container kube-proxy is attempting to run without a required securityContext/runAsGroup. Allowed runAsGroup: {"ranges": [{"max": 65535, "min": 1}], "rule": "MustRunAs"}
[denied by caps-constraints] container <kube-proxy> is not dropping all required capabilities. Container must drop all of ["ALL"]
[denied by psp-hostfs-constraints] HostPath volume {"name": "xtables-lock", "hostPath": {"path": "/run/xtables.lock", "type": "FileOrCreate"}} is not allowed, pod: kube-proxy-j5h2d. Allowed path: [{"pathPrefix": "/tmp", "readOnly": true}]
I didn't notice this for a solid month after I'd applied the changes; it only showed up after one of my EKS nodes restarted for some reason.
Posting here in hopes it might save somebody else the day I lost.
Upvotes: 0
Reputation: 147
I also encounter same situation when initialized cluster with pods cidr #kubeadm init --pod-network-cidr=10.10.0.0/16
But, #kubectl get pods --all-namespaces command helped to fix the issue.
Upvotes: -1
Reputation: 10303
For anyone else running into this problem, I was finally able to solve this by doing
kubectl delete -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
followed by
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
There must have been some version incompatibility between version 3.11, which I had installed a few months ago and the new node.
Upvotes: 2