Reputation: 5149
I have a K8s cluster that was working properly but because of power failure, all the nodes got rebooted.
At the moment I have some problem recovering the master (and other nodes):
sudo systemctl kubelet status
is returning Unknown operation kubelet.
but when I run kubeadm init ...
(the command that I set up the cluster with) it returns:error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10251]: Port 10251 is in use
[ERROR Port-10252]: Port 10252 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
and when I checked those ports I can see that kubelet and other K8s components are using them:
~/k8s-multi-node$ sudo lsof -i :10251
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kube-sche 26292 root 3u IPv6 104933 0t0 TCP *:10251 (LISTEN)
~/k8s-multi-node$ sudo lsof -i :10252
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kube-cont 26256 root 3u IPv6 115541 0t0 TCP *:10252 (LISTEN)
~/k8s-multi-node$ sudo lsof -i :10250
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
kubelet 24781 root 27u IPv6 106821 0t0 TCP *:10250 (LISTEN)
I tried to kill them but they start to use those ports again.
So what is the proper way to recover such a cluster? Do I need to remove kubelet and all the otehr components and install them again?
Upvotes: 5
Views: 9527
Reputation: 44559
You need to first stop kubelet using sudo systemctl stop kubelet.service
After that run kubeadm reset
and then kubeadm init
. Note that this will clean up existing cluster and create a new cluster altogether.
Regarding proper way to recover check this question
Upvotes: 19