Reputation: 2455
Context
I installed Docker
following this instruction on my Ubuntu 18.04 LTS (Server)
and later on Kubernetes
followed via kubeadm
. After initializing (kubeadm init --pod-network-cidr=10.10.10.10/24
) and joining a second node (I got a two node cluster for the start) I cannot get my coredns as well as the later applied Web UI (Dashboard) to actually go into status Running.
As pod network I tried both, Flannel (kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
) and Weave Net - Nothing changed. It still shows status ContainerCreating, even after hours of waiting:
Question
Why doesn't the container creation work as expected and what might be the root cause for this? And most importantly: How do I solve this?
Edit
Summing up my answer below, here are the reasons why:
cgroups
instead of systemd
iptables
correctlykubeadm init
since flannels standard-yaml requires --pod-network-cidr
to be 10.244.0.0/16
Upvotes: 1
Views: 2131
Reputation: 141
So I had the same issue as stated above. For me, this was the perfect solution to fix this, but also other pods were stuck on either pending or ContainerCreating. In addition as the fix above, my flannel encountered an unnoticed error, so I needed to rerun the flannel create.
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Upvotes: 3
Reputation: 2455
Since answering this questions took me a lot of time, I wanted to share what got me out of this. There might be some more code than necessary, but I also want this to be in one place if I or someone else has to redo all steps.
First it all started with Docker...
I figured out that it presumably all started with the way I installed Docker. Following the linked online-instructions I used sudo apt-get install docker.io
in order to install Docker and used it with cgroups
by doing sudo usermod -aG docker $USER
.
Well, taking a look at the official instructions from Kubernetes this was a mistake: systemd
is the recommended way to go!
So I completly purged all I ever did with docker by following these great instructions from Mayur Bhandare:
sudo apt-get purge -y docker-engine docker docker.io docker-ce
sudo apt-get autoremove -y --purge docker-engine docker docker.io docker-ce
sudo rm -rf /var/lib/docker /etc/docker
sudo rm /etc/apparmor.d/docker
sudo groupdel docker
sudo rm -rf /var/run/docker.sock
# Reboot to be sure
Afterwards I installed reinstalled the official way (keep in mind that this might change in the future):
# Install Docker CE
## Set up the repository:
### Install packages to allow apt to use a repository over HTTPS
apt-get update && apt-get install -y \
apt-transport-https ca-certificates curl software-properties-common gnupg2
### Add Docker’s official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
### Add Docker apt repository.
add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
## Install Docker CE.
apt-get update && apt-get install -y \
containerd.io=1.2.10-3 \
docker-ce=5:19.03.4~3-0~ubuntu-$(lsb_release -cs) \
docker-ce-cli=5:19.03.4~3-0~ubuntu-$(lsb_release -cs)
# Setup daemon.
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
mkdir -p /etc/systemd/system/docker.service.d
# Restart docker.
systemctl daemon-reload
systemctl restart docker
Note that this explicitly uses systemd
!
... and then it went on with Flannel...
Above I wrote my sudo kubeadm init
was done with --pod-network-cidr=10.10.10.10/24
since the latter was the IP of my master.
Well, as pointed out here not using the official recommended --pod-network-cidr=10.244.0.0/16
results in an error for example using kubectl proxy
or the container-creation when using the provided kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
.
This is due to the fact that 10.244.0.0/16
is hard-linked in the .yaml
and, hence, mandatory - Or you just change it in the .yaml
.
In order to get rid of the false configuration I did a full reset.
This can be achieved using sudo kubeadm reset
and by deleting the config with sudo rm -r ~/.kube/config
.
Anyhow, since I screwed it so much, I did a full reset by uninstalling and reinstalling kubeadm
and making sure it did use iptables
this time (which I also forgot to do before...).
Here is a nice link how to fully uninstall all kubeadm-parts.
kubeadm reset
sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*
sudo apt-get autoremove
sudo rm -rf ~/.kube
For the sake of completeness, here is the reinstall as well:
# ensure legacy binaries are installed
sudo apt-get install -y iptables arptables ebtables
# switch to legacy versions
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo update-alternatives --set arptables /usr/sbin/arptables-legacy
sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacy
# Install Kubernetes with kubeadm
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
#reboot
... and finally it worked!
After the clean reinstallation I did the following:
# Initialize with correct cidr
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
And then be astouned by the result:
kubectl get pods --all-namespaces
On a site note: This also resolved the /run/flannel/subnet.env: no such file or directory
-error I encountered prior to these steps when describing the uncreated coredns.
Upvotes: 5