Reputation: 89

Flannel is crashing for Slave node

I am getting this result for flannel service on my slave node. Flannel is running fine on master node.

kube-system   kube-flannel-ds-amd64-xbtrf      0/1     CrashLoopBackOff   4          3m5s

Kube-proxy running on the slave is fine but not the flannel pod.

I have a master and a slave node only. At first its say running, then it goes to error and finally, crashloopbackoff.

godfrey@master:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                             READY   STATUS             RESTARTS   AGE     IP                NODE     NOMINATED NODE   READINESS GATES
kube-system   kube-flannel-ds-amd64-jszwx      0/1     CrashLoopBackOff   4          2m17s   192.168.152.104   slave3   <none>           <none>
kube-system   kube-proxy-hxs6m                 1/1     Running            0          18m     192.168.152.104   slave3   <none>           <none>

I am also getting this from the logs:

I0515 05:14:53.975822       1 main.go:390] Found network config - Backend type: vxlan
I0515 05:14:53.975856       1 vxlan.go:121] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0515 05:14:53.976072       1 main.go:291] Error registering network: failed to acquire lease: node "slave3" pod cidr not assigned
I0515 05:14:53.976154       1 main.go:370] Stopping shutdownHandler...

I could not find a solution so far. Help appreciated.

Upvotes: 3

Answers (2)

VAS

Reputation: 9041

The described cluster configuration doesn't look correct in two aspects:

First of all, PodCIDR reasonable minimum subnet size is /16. Each Kubernetes node usually gets /24 subnet because it can run up to 100 pods.
PodCIDR and ServicesCIDR (default: "10.96.0.0/12") must not interfere with your existing LAN network and with each other.

So, correct kubeadm command would look like:

$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16

In your case PodCIDR subnet is only /24 and it was assigned to master node. Slave node didn't get its own /24 subnet, so Flannel Pod showed the error in the logs:

Error registering network: failed to acquire lease: node "slave3" pod cidr not assigned

Assigning the same subnet to several nodes manually will lead to the other connectivity problems.

You can find more details on Kubernetes IP subnets in GKE documentation.

The second problem is the IP subnet number.

Recent Calico network addon versions are able to detect the correct Pod subnet based on kubeadm parameter --pod-network-cidr. Older version was using predefined subnet 192.168.0.0/16 and you had to adjust it in its YAML file in the Deaemonset specification :

         - name: CALICO_IPV4POOL_CIDR
           value: "192.168.0.0/16"

Flannel is still requires default subnet ( 10.244.0.0/16 ) to be specified for kubeadm init.
To use custom subnet for your cluster, Flannel "installation" YAML file should be adjusted before applying to the cluster.

...
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
...
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
...

So the following should work for any version of Kubernetes and Calico:

$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

# Latest Calico version
$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

# or specific version, v3.14 in this case, which is also latest at the moment
# kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml

Same for Flannel:

$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

# For Kubernetes v1.7+ 
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml    

# For older versions of Kubernetes:
# For RBAC enabled clusters:
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-legacy.yml
$

There are many other network addons. You can find the list in the documentation:

Upvotes: 1

Mark Watney

Reputation: 5960

As solution came from OP, I'm posting answer as community wiki.

As reported by OP in the comments, he didn't passed the podCIDR during kubeadm init.

The following command was used to see that the flannel pod was in "CrashLoopBackoff" state:

sudo kubectl get pods --all-namespaces -o wide

To confirm that podCIDR was not passed to flannel pod kube-flannel-ds-amd64-ksmmh that was in CrashLoopBackoff state.

$ kubectl logs kube-flannel-ds-amd64-ksmmh

kubeadm init --pod-network-cidr=172.168.10.0/24 didn't pass the podCIDR to the slave nodes as expected.

Hence to solve the problem, kubectl patch node slave1 -p '{"spec":{"podCIDR":"172.168.10.0/24"}}' command had to be used to pass podCIDR to each slave node.

Please see this link: coreos.com/flannel/docs/latest/troubleshooting.html and section "Kubernetes Specific"

Upvotes: 1

Flannel is crashing for Slave node

Answers (2)

Related Questions