yasin lachini
yasin lachini

Reputation: 5976

my kubernetes cluster is down after reboot

After every reboot my kubernetes cluster does not work fine and I get

The connection to the server 192.168.1.4:6443 was refused - did you specify the right host or port?

I have 4 ubuntu on baremetal one of them is master and 3 workers and I turned off swap and disabled it. and I read somewhere I should run this command to solve it

sudo -i
swapoff -a
exit
strace -eopenat kubectl version

and it works. But why this happened??

Upvotes: 2

Views: 10936

Answers (4)

Taha Yousuf Ali
Taha Yousuf Ali

Reputation: 11

This issue occurs most probably when swap is turned ON and KUBECONFIG variable is lost due to VM restart. Simply follow below steps to resolve the issue:

swapoff -a (Turn the swap off) - The kubelet doesn't work when the swap is ON.
export KUBECONFIG=$HOME/.kube/config - Export kubeconfig variable of Kubernetes.

Now run:

kubectl get nodes

Eventually, the cluster & pods will take some time to recover, they might give Unknown-status at first but then after 5-10 mins it will come in Running state.

Upvotes: 0

Mostafa Ghadimi
Mostafa Ghadimi

Reputation: 6736

It depends on how you install the cluster.

In this post, I will mention the possible ways to resolve this problem.

  1. Make sure the swap is off.

    swapoff -a
    
  2. Check the state of Kubelet. In case it is exited and can't work properly, you can check the log of it.

    journalctl -xfu kubelet.service
    

    In my case the log wasn't that helpful. I was trying to see the log of other components. After searching and try to find a clue, I have found that there is an error with ‍cri-dockerd service, because it may not be enabled in systemd.

    systemctl start cri-dockerd.service
    systemctl enable cri-dockerd.service
    

    Finally restart the kubelet service and check its status:

    systemctl restart kubelet.service
    systemctl status kubelet.service
    

Upvotes: 0

Mark
Mark

Reputation: 4067

First please run systemctl status kubelet and verify if the service is running:
"Active: active (running)"
Disable swap:

sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

verify all reference found in /etc/fstab about swap.

Please perform also post "kubeadm init" steps for current user as described here: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

After reboot please check:
systemctl status docker enable docker at startup if it's not working
systemctl enable docker

You can also verify kubelet status:

systemctl status kubelet
systemctl enable kubelet

take a look for any errors:

journalctl -u kubelet.service
journalctl

And please share with your findings.

Upvotes: 9

P Ekambaram
P Ekambaram

Reputation: 17615

Most likely that Kubelet is not getting restarted. You need to check Kubelet logs correct the issues if any.

Check docker driver and the driver used by kubelet should be same.

Swap should be disabled, and so on

Upvotes: 0

Related Questions