Hari
Hari

Reputation: 141

Kubelet Not Starting: Crashing with Exit Status: 255/n/a

Ubuntu 16.04 LTS, Docker 17.12.1, Kubernetes 1.10.0

Kubelet not starting:

Jun 22 06:45:57 dev-master systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a

Jun 22 06:45:57 dev-master systemd[1]: kubelet.service: Failed with result 'exit-code'.

Note: No issue with v1.9.1

LOGS:

Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.518085   20051 docker_service.go:249] Docker Info: &{ID:WDJK:3BCI:BGCM:VNF3:SXGW:XO5G:KJ3Z:EKIH:XGP7:XJGG:LFBL:YWAJ Containers:0 ContainersRunning:0 ContainersPaused:0 ContainersStopped:0 Images:1 Driver:btrfs DriverStatus:[[Build Version Btrfs v4.15.1] [Library Vers
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.521232   20051 docker_service.go:262] Setting cgroupDriver to cgroupfs
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.532834   20051 remote_runtime.go:43] Connecting to runtime service unix:///var/run/dockershim.sock
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.533812   20051 kuberuntime_manager.go:186] Container runtime docker initialized, version: 18.05.0-ce, apiVersion: 1.37.0
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.534071   20051 csi_plugin.go:61] kubernetes.io/csi: plugin initializing...
Jun 22 06:45:55 dev-master hyperkube[20051]: W0622 06:45:55.534846   20051 kubelet.go:903] Accelerators feature is deprecated and will be removed in v1.11. Please use device plugins instead. They can be enabled using the DevicePlugins feature gate.
Jun 22 06:45:55 dev-master hyperkube[20051]: W0622 06:45:55.535035   20051 kubelet.go:909] GPU manager init error: couldn't get a handle to the library: unable to open a handle to the library, GPU feature is disabled.
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.535082   20051 server.go:129] Starting to listen on 0.0.0.0:10250
Jun 22 06:45:55 dev-master hyperkube[20051]: E0622 06:45:55.535164   20051 kubelet.go:1282] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data for container /
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.535189   20051 server.go:944] Started kubelet
Jun 22 06:45:55 dev-master hyperkube[20051]: E0622 06:45:55.535555   20051 event.go:209] Unable to write event: 'Post https://10.50.50.201:8001/api/v1/namespaces/default/events: dial tcp 10.50.50.201:8001: getsockopt: connection refused' (may retry after sleeping)
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.535825   20051 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.536202   20051 status_manager.go:140] Starting to sync pod status with apiserver
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.536253   20051 kubelet.go:1782] Starting kubelet main sync loop.
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.536285   20051 kubelet.go:1799] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s]
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.536464   20051 volume_manager.go:247] Starting Kubelet Volume Manager
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.536613   20051 desired_state_of_world_populator.go:129] Desired state populator starts to run
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.538574   20051 server.go:299] Adding debug handlers to kubelet server.
Jun 22 06:45:55 dev-master hyperkube[20051]: W0622 06:45:55.538664   20051 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 22 06:45:55 dev-master hyperkube[20051]: E0622 06:45:55.539199   20051 kubelet.go:2130] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.636465   20051 kubelet.go:1799] skipping pod synchronization - [container runtime is down]
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.636795   20051 kubelet_node_status.go:289] Setting node annotation to enable volume controller attach/detach
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.638630   20051 kubelet_node_status.go:83] Attempting to register node 10.50.50.201
Jun 22 06:45:55 dev-master hyperkube[20051]: E0622 06:45:55.638954   20051 kubelet_node_status.go:107] Unable to register node "10.50.50.201" with API server: Post https://10.50.50.201:8001/api/v1/nodes: dial tcp 10.50.50.201:8001: getsockopt: connection refused
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.836686   20051 kubelet.go:1799] skipping pod synchronization - [container runtime is down]
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.839219   20051 kubelet_node_status.go:289] Setting node annotation to enable volume controller attach/detach
Jun 22 06:45:55 dev-master hyperkube[20051]: I0622 06:45:55.841028   20051 kubelet_node_status.go:83] Attempting to register node 10.50.50.201
Jun 22 06:45:55 dev-master hyperkube[20051]: E0622 06:45:55.841357   20051 kubelet_node_status.go:107] Unable to register node "10.50.50.201" with API server: Post https://10.50.50.201:8001/api/v1/nodes: dial tcp 10.50.50.201:8001: getsockopt: connection refused
Jun 22 06:45:56 dev-master hyperkube[20051]: I0622 06:45:56.236826   20051 kubelet.go:1799] skipping pod synchronization - [container runtime is down]
Jun 22 06:45:56 dev-master hyperkube[20051]: I0622 06:45:56.241590   20051 kubelet_node_status.go:289] Setting node annotation to enable volume controller attach/detach
Jun 22 06:45:56 dev-master hyperkube[20051]: I0622 06:45:56.245081   20051 kubelet_node_status.go:83] Attempting to register node 10.50.50.201
Jun 22 06:45:56 dev-master hyperkube[20051]: E0622 06:45:56.245475   20051 kubelet_node_status.go:107] Unable to register node "10.50.50.201" with API server: Post https://10.50.50.201:8001/api/v1/nodes: dial tcp 10.50.50.201:8001: getsockopt: connection refused
Jun 22 06:45:56 dev-master hyperkube[20051]: E0622 06:45:56.492206   20051 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:450: Failed to list *v1.Service: Get https://10.50.50.201:8001/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.50.50.201:8001: getsockopt: connection refused
Jun 22 06:45:56 dev-master hyperkube[20051]: E0622 06:45:56.493216   20051 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.50.50.201:8001/api/v1/pods?fieldSelector=spec.nodeName%3D10.50.50.201&limit=500&resourceVersion=0: dial tcp 10.50.50.201:8001: getsockopt: co
Jun 22 06:45:56 dev-master hyperkube[20051]: E0622 06:45:56.494240   20051 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:459: Failed to list *v1.Node: Get https://10.50.50.201:8001/api/v1/nodes?fieldSelector=metadata.name%3D10.50.50.201&limit=500&resourceVersion=0: dial tcp 10.50.50.201:8001: getsockopt: connecti
Jun 22 06:45:57 dev-master hyperkube[20051]: I0622 06:45:57.036893   20051 kubelet.go:1799] skipping pod synchronization - [container runtime is down]
Jun 22 06:45:57 dev-master hyperkube[20051]: I0622 06:45:57.045705   20051 kubelet_node_status.go:289] Setting node annotation to enable volume controller attach/detach
Jun 22 06:45:57 dev-master hyperkube[20051]: I0622 06:45:57.047489   20051 kubelet_node_status.go:83] Attempting to register node 10.50.50.201
Jun 22 06:45:57 dev-master hyperkube[20051]: E0622 06:45:57.047787   20051 kubelet_node_status.go:107] Unable to register node "10.50.50.201" with API server: Post https://10.50.50.201:8001/api/v1/nodes: dial tcp 10.50.50.201:8001: getsockopt: connection refused
Jun 22 06:45:57 dev-master hyperkube[20051]: E0622 06:45:57.413319   20051 event.go:209] Unable to write event: 'Post https://10.50.50.201:8001/api/v1/namespaces/default/events: dial tcp 10.50.50.201:8001: getsockopt: connection refused' (may retry after sleeping)
Jun 22 06:45:57 dev-master hyperkube[20051]: E0622 06:45:57.492781   20051 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:450: Failed to list *v1.Service: Get https://10.50.50.201:8001/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.50.50.201:8001: getsockopt: connection refused
Jun 22 06:45:57 dev-master hyperkube[20051]: E0622 06:45:57.493560   20051 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.50.50.201:8001/api/v1/pods?fieldSelector=spec.nodeName%3D10.50.50.201&limit=500&resourceVersion=0: dial tcp 10.50.50.201:8001: getsockopt: co
Jun 22 06:45:57 dev-master hyperkube[20051]: E0622 06:45:57.494574   20051 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:459: Failed to list *v1.Node: Get https://10.50.50.201:8001/api/v1/nodes?fieldSelector=metadata.name%3D10.50.50.201&limit=500&resourceVersion=0: dial tcp 10.50.50.201:8001: getsockopt: connecti
Jun 22 06:45:57 dev-master hyperkube[20051]: W0622 06:45:57.549477   20051 manager.go:340] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
Jun 22 06:45:57 dev-master hyperkube[20051]: I0622 06:45:57.659932   20051 kubelet_node_status.go:289] Setting node annotation to enable volume controller attach/detach
Jun 22 06:45:57 dev-master hyperkube[20051]: I0622 06:45:57.661447   20051 cpu_manager.go:155] [cpumanager] starting with none policy
Jun 22 06:45:57 dev-master hyperkube[20051]: I0622 06:45:57.661459   20051 cpu_manager.go:156] [cpumanager] reconciling every 10s
Jun 22 06:45:57 dev-master hyperkube[20051]: I0622 06:45:57.661468   20051 policy_none.go:42] [cpumanager] none policy: Start
Jun 22 06:45:57 dev-master hyperkube[20051]: W0622 06:45:57.661523   20051 fs.go:539] stat failed on /dev/loop10 with error: no such file or directory
Jun 22 06:45:57 dev-master hyperkube[20051]: F0622 06:45:57.661535   20051 kubelet.go:1359] Failed to start ContainerManager failed to get rootfs info: failed to get device for dir "/var/lib/kubelet": could not find device with major: 0, minor: 126 in cached partitions map
Jun 22 06:45:57 dev-master systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jun 22 06:45:57 dev-master systemd[1]: kubelet.service: Failed with result 'exit-code'.

Upvotes: 3

Views: 17498

Answers (5)

Zekeriya
Zekeriya

Reputation: 1

This problem can also arise if the service is not enabled after the Docker installation. After the next reboot, Docker is not started and kubelet cannot start either. So don't forget after the Docker installation:

sudo systemctl enable docker

Upvotes: -1

Danilo Pianini
Danilo Pianini

Reputation: 1106

user1188867's answer is definitely correct.

I want to add a piece of information for further reference for those not using Ubuntu. I hit this on a cluster with Clear Linux on bare metal. I'm attaching here instructions on how to detect the issue in such environment and disable swap to solve.

First, launching sudo systemctl status kubelet after reboot produces the following:

● kubelet.service - kubelet: The Kubernetes Node Agent 
    Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
   Drop-In: /etc/systemd/system/kubelet.service.d
            └─0-cni.conf, 0-containerd.conf, 10-cgroup-driver.conf
    Active: activating (auto-restart) (Result: exit-code) since Thu 2020-12-17 11:04:37 CET; 2s ago
      Docs: http://kubernetes.io/docs/
   Process: 2404 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, st> 
   Process: 2405 ExecStartPost=/usr/bin/kubelet-version-check.sh store (code=exited, status=0/SUCCESS)
  Main PID: 2404 (code=exited, status=255/EXCEPTION)
       CPU: 683ms

The issue was actually the existence of the swap file. To disable it:

  1. add the nofail option to /usr/lib/systemd/system/var-swapfile.swap. I personally used the following one-liner: sudo sed -i s/Options=/Options=nofail,/ /usr/lib/systemd/system/var-swapfile.swap
  2. Disable swapping: sudo swapoff -a
  3. Delete the swap file: sudo rm /var/swapfile

This procedure on Clear Linux persists the deactivation of the swap on reboots.

Upvotes: 2

user1188867
user1188867

Reputation: 3978

Run the following command on all your nodes. It worked for me.

 swapoff -a

Upvotes: 6

Daniel Watrous
Daniel Watrous

Reputation: 3861

I had the same exit status, but my kubelet failed to start due to a limitation on the number of max_user_watches. The following got the kubelet working again https://github.com/google/cadvisor/issues/1581#issuecomment-367616070

Upvotes: 0

Artem Golenyaev
Artem Golenyaev

Reputation: 2757

I have found a lot of the same error messages in your logs:

dial tcp 10.50.50.201:8001: getsockopt: connection refused

There may be several problems:

  • IP address and/or Port are incorrect
  • no access from the Worker to the Master
  • something wrong with your Master, for example, kube-apiserver is down

You should look in that direction.

Upvotes: 2

Related Questions