lex
lex

Reputation: 1861

How to debug when Kubernetes nodes are in 'Not Ready' state

I initialized the master node and add 2 worker nodes, but only master and one of the worker node show up when I run the following command:

kubectl get nodes

also, both these nodes are in 'Not Ready' state. What are the steps should I take to understand what the problem could be?

Upvotes: 98

Views: 262960

Answers (8)

nobjta_9x_tq
nobjta_9x_tq

Reputation: 1241

In my case, I had 4 nodes in Vmware. When I turn off all machine and open it again in last few days, master node cannot telnet into 6443. So, I run iptables -F for clear all broken rules. After that, node can see each other via 6443 port and ready status come again.

[root@master1 roof]# ku get node
NAME      STATUS     ROLES           AGE   VERSION
master1   Ready      control-plane   33h   v1.28.2
worker1   NotReady   <none>          15m   v1.28.2
worker2   NotReady   <none>          15m   v1.28.2

run in all node:

iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X
iptables-save | grep -v KUBE | grep -v cali > clear.rules 
iptables-restore < clear.rules
modprobe overlay && modprobe br_netfilter && echo '1' > /proc/sys/net/ipv4/ip_forward
## join again when deleted node:
kubeadm join 192.168.1.195:6443 --token 2z5bhp.y9o0t1ppe4t28wtx --discovery-token-ca-cert-hash sha256:adfe17ca3aef931d9c46a373a5899edfeed4b41486b9d9c3a611dd0e074e5dab --v=5
# backup rules for next time reboot after all node ready:
iptables-save > /etc/sysconfig/iptables

Result:

[root@master1 roof]# ku get node
NAME           STATUS   ROLES           AGE     VERSION
master1        Ready    control-plane   34h     v1.28.2
worker-node1   Ready    <none>          4m21s   v1.28.2
worker-node2   Ready    <none>          83s     v1.28.2

Upvotes: 0

Adeyemi Kayode
Adeyemi Kayode

Reputation: 31

If minikube is installed run

minikube start --vm-driver docker

or

minikube start --vm-driver

minikube start --vm-driver will automatically select and use the KVM.

To check the status run

minikube status

The output

minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured

One of the reasons for this issue is when you restart/shutdown your computer/server

Upvotes: 0

Abdel Hegazi
Abdel Hegazi

Reputation: 398

I recently had this issue and checking out the known-issues from kind website here https://kind.sigs.k8s.io/docs/user/known-issues/ it would tell you specifically the main problem mostly comes from the lack of memory allocated to docker. They actually advice to allocate 8GB to docker, I allocated 6GB up from 3GB and it worked fine for me this is kind version I am running atm

$ kind version
kind v0.10.0 go1.15.7 darwin/amd64

and this is docker version

$ docker version
Client:
 Cloud integration: 1.0.17
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:55:20 2021
 OS/Arch:           darwin/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:52:10 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b63
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I hope this helps you or anyone facing the same issue. and here is the output from kind

$ k get node
NAME                  STATUS   ROLES                  AGE     VERSION
test2-control-plane   Ready    control-plane,master   4m42s   v1.20.2

Upvotes: 1

ReadyPlayer1
ReadyPlayer1

Reputation: 21

I found applying the network and rebooting both the nodes did the trick for me.

kubectl apply -f [podnetwork].yaml

Upvotes: 2

rslj
rslj

Reputation: 380

I recently started using VMWare Octant https://github.com/vmware-tanzu/octant. This is a better UI than the Kubernetes Dashboard. You can view the Kubernetes cluster and look at the details of the cluster and the PODS. This will allow you to check the logs and open a terminal into the POD(s).

Upvotes: 1

Deepak
Deepak

Reputation: 421

Steps to debug:-

In case you face any issue in kubernetes, first step is to check if kubernetes self applications are running fine or not.

Command to check:- kubectl get pods -n kube-system

If you see any pod is crashing, check it's logs

if getting NotReady state error, verify network pod logs.

if not able to resolve with above, follow below steps:-

  1. kubectl get nodes # Check which node is not in ready state

  2. kubectl describe node nodename #nodename which is not in readystate

  3. ssh to that node

  4. execute systemctl status kubelet # Make sure kubelet is running

  5. systemctl status docker # Make sure docker service is running

  6. journalctl -u kubelet # To Check logs in depth

Most probably you will get to know about error here, After fixing it reset kubelet with below commands:-

  1. systemctl daemon-reload
  2. systemctl restart kubelet

In case you still didn't get the root cause, check below things:-

  1. Make sure your node has enough space and memory. Check for /var directory space especially. command to check: -df -kh, free -m

  2. Verify cpu utilization with top command. and make sure any process is not taking an unexpected memory.

Upvotes: 36

Syed Faraz Umar
Syed Faraz Umar

Reputation: 438

I was having similar issue because of a different reason:

Error:

cord@node1:~$ kubectl get nodes
NAME    STATUS     ROLES    AGE     VERSION
node1   Ready      master   17h     v1.13.5
node2   Ready      <none>   17h     v1.13.5
node3   NotReady   <none>   9m48s   v1.13.5

cord@node1:~$ kubectl describe node node3
Name:               node3
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  Ready            False   Thu, 18 Apr 2019 01:15:46 -0400   Thu, 18 Apr 2019 01:03:48 -0400   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
  InternalIP:  192.168.2.6
  Hostname:    node3

cord@node3:~$ journalctl -u kubelet

Apr 18 01:24:50 node3 kubelet[54132]: W0418 01:24:50.649047   54132 cni.go:149] Error loading CNI config list file /etc/cni/net.d/10-calico.conflist: error parsing configuration list: no 'plugins' key
Apr 18 01:24:50 node3 kubelet[54132]: W0418 01:24:50.649086   54132 cni.go:203] Unable to update cni config: No valid networks found in /etc/cni/net.d
Apr 18 01:24:50 node3 kubelet[54132]: E0418 01:24:50.649402   54132 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 18 01:24:55 node3 kubelet[54132]: W0418 01:24:55.650816   54132 cni.go:149] Error loading CNI config list file /etc/cni/net.d/10-calico.conflist: error parsing configuration list: no 'plugins' key
Apr 18 01:24:55 node3 kubelet[54132]: W0418 01:24:55.650845   54132 cni.go:203] Unable to update cni config: No valid networks found in /etc/cni/net.d
Apr 18 01:24:55 node3 kubelet[54132]: E0418 01:24:55.651056   54132 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 18 01:24:57 node3 kubelet[54132]: I0418 01:24:57.248519   54132 setters.go:72] Using node IP: "192.168.2.6"

Issue:

My file: 10-calico.conflist was incorrect. Verified it from a different node and from sample file in the same directory "calico.conflist.template".

Resolution:

Changing the file, "10-calico.conflist" and restarting the service using "systemctl restart kubelet", resolved my issue:

NAME    STATUS   ROLES    AGE   VERSION
node1   Ready    master   18h   v1.13.5
node2   Ready    <none>   18h   v1.13.5
node3   Ready    <none>   48m   v1.13.5

Upvotes: 2

Shahidh
Shahidh

Reputation: 2682

First, describe nodes and see if it reports anything:

$ kubectl describe nodes

Look for conditions, capacity and allocatable:

Conditions:
  Type              Status
  ----              ------
  OutOfDisk         False
  MemoryPressure    False
  DiskPressure      False
  Ready             True
Capacity:
 cpu:       2
 memory:    2052588Ki
 pods:      110
Allocatable:
 cpu:       2
 memory:    1950188Ki
 pods:      110

If everything is alright here, SSH into the node and observe kubelet logs to see if it reports anything. Like certificate erros, authentication errors etc.

If kubelet is running as a systemd service, you can use

$ journalctl -u kubelet

Upvotes: 136

Related Questions