jersey bean
jersey bean

Reputation: 3639

Kubernetes worker node is NotReady due to CNI plugin not initialized

I'm using kind to run a test kubernetes cluster on my local Macbook.

I found one of the nodes with status NotReady:

$ kind get clusters                                                                                                                                                                 
mc

$ kubernetes get nodes
NAME                STATUS     ROLES    AGE     VERSION
mc-control-plane    Ready      master   4h42m   v1.18.2
mc-control-plane2   Ready      master   4h41m   v1.18.2
mc-control-plane3   Ready      master   4h40m   v1.18.2
mc-worker           NotReady   <none>   4h40m   v1.18.2
mc-worker2          Ready      <none>   4h40m   v1.18.2
mc-worker3          Ready      <none>   4h40m   v1.18.2

The only interesting thing in kubectl describe node mc-worker is that the CNI plugin not initialized:

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 11 Aug 2020 16:55:44 -0700   Tue, 11 Aug 2020 12:10:16 -0700   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 11 Aug 2020 16:55:44 -0700   Tue, 11 Aug 2020 12:10:16 -0700   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 11 Aug 2020 16:55:44 -0700   Tue, 11 Aug 2020 12:10:16 -0700   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Tue, 11 Aug 2020 16:55:44 -0700   Tue, 11 Aug 2020 12:10:16 -0700   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady
message:Network plugin returns error: cni plugin not initialized

I have 2 similar clusters and this only occurs on this cluster.

Since kind uses the local Docker daemon to run these nodes as containers, I have already tried to restart the container (should be the equivalent of rebooting the node).

I have considered deleting and recreating the cluster, but there ought to be a way to solve this without recreating the cluster.

Here are the versions that I'm running:

$ kind version                                                                                                                                                                     
kind v0.8.1 go1.14.4 darwin/amd64

$ kubectl version                                                                                                                                                  
Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-30T20:19:45Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

How do you resolve this issue?

Upvotes: 13

Views: 49872

Answers (7)

jehanzaib
jehanzaib

Reputation: 105

Write this in your main cluster after you write:

sudo kubeadm init

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

After that:

kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml

and then on the worker node write:

sudo kubeadm reset

and then write the command to join worker node with cluster node:

sudo kubeadm join 10.8.1.64:6443 --token gx2bz1.rnoe8bf1tizle4up \
    --discovery-token-ca-cert-hash sha256:25c93c64c4625e3c66c4621aef5f80af9938bd6b34f0d4d419c9e1ed8d7432db

Upvotes: 0

Rytis Dereskevicius
Rytis Dereskevicius

Reputation: 1471

If you are using AWS EKS cluster and see the error "CNI plugin not initialized" or "nodes not joining the Kubernetes cluster", please make sure you have the correct add-ons installed.

  1. Navigate to the EKS cluster in the AWS console
  2. Go to the add-ons section.
  3. Install the following add-ons: CoreDNS, kube-proxy, and VPC CNI. Make sure to set the "Conflict resolution method" to "Override".

This should solve the issue. I hope this saves someone a couple of hours in the future, especially since AWS is pushing Kubernetes version updates quickly.

enter image description here

Upvotes: 1

user25052085
user25052085

Reputation: 1

i got the same problem , i found it dues to i have not installed cni network.

  1. check cni network , we must have cni0

    root@k8s-main:/home/zz/k8s#ifconfig cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 10.244.0.1 netmask 255.255.255.0 broadcast 10.244.0.255 inet6 fe80::a85a:79ff:fe3f:a28e prefixlen 64 scopeid 0x20 ether aa:5a:79:3f:a2:8e txqueuelen 1000 (Ethernet) RX packets 1295 bytes 107173 (107.1 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1451 bytes 157776 (157.7 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

  2. if you cannot see above one 2.1 download the cni network linux pkg from this link: https://github.com/containernetworking/plugins/releases/download/v0.8.6/cni-plugins-linux-amd64-v0.8.6.tgz 2.2 unzip this pkg cni-plugins-linux-amd64-v0.8.6.tgz to this directory /opt/cni/bin 2.3 dwonload the flannel yaml example from internet https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml 2.4 kubectl apply -f flannel.yaml

then everything will run smooths

Upvotes: 0

AniketGole
AniketGole

Reputation: 1285

Stop and disable apparmor & restart the containerd service on that node will solve your issue

root@node:~# systemctl stop apparmor
root@node:~# systemctl disable apparmor 
root@node:~# systemctl restart containerd.service

Upvotes: 5

denizg
denizg

Reputation: 952

In my case, I added first node group to my new EKS cluster, and its status was failed. This message appeared in logs. None of the above solved my problem. I was using latest CNI addon. My problem was that I created role for node group. That role had AmazonEKSWorkerNodePolicy and AmazonEC2ContainerRegistryFullReadonlyAccess. But I forgot to add AmazonEKS_CNI_Policy. After adding this policy, my problem solved.

Upvotes: 0

SJJ
SJJ

Reputation: 31

I encountered this scenario. Master is Ready but the worker node's status are not. After some investigation, i found out that the /opt/cni/bin is empty - there is no network plugin for my worker node hosts. Thus, i installed this "kubernetes-cni.x86_64" and restarted kubelet service. This solved the "NotReady" status of my worker nodes.

Upvotes: 3

Rico
Rico

Reputation: 61521

Most likely cause:

The docker VM is running out of some resource and cannot start CNI on that particular node.

You can poke around in the HyperKit VM by connecting to it:

From a shell:

screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty

If that doesn't work for some reason:

docker run -it --rm --privileged --pid=host alpine nsenter -t 1 -m -u -n -i sh

Once in the VM:

# ps -Af
# free
# df -h
...

Then you can always update the setting on the docker UI:

Image1

Finally, your node after all is running in a container. So you can connect to that container and see what kubelet errors you see:

docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                       NAMES
6d881be79f4a        kindest/node:v1.18.2   "/usr/local/bin/entr…"   32 seconds ago      Up 29 seconds       127.0.0.1:57316->6443/tcp   kind-control-plane
docker exec -it 6d881be79f4a bash
root@kind-control-plane:/# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/kind/systemd/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Wed 2020-08-12 02:32:16 UTC; 35s ago
     Docs: http://kubernetes.io/docs/
 Main PID: 768 (kubelet)
    Tasks: 23 (limit: 2348)
   Memory: 32.8M
   CGroup: /docker/6d881be79f4a8ded3162ec6b5caa8805542ff9703fabf5d3d2eee204a0814e01/system.slice/kubelet.service
           └─768 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet
/config.yaml --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --fail-swap-on=false --node-ip= --fail-swap-on=false
...

✌️

Upvotes: 5

Related Questions