Reputation: 741
I'm a little confused why my rancher-agent
is no longer able to connect to the cluster server. This was working for me for a long time, but it appears to have broken on its own. DNS and networking confuses me.
My setup:
Ubuntu 20.04.6 LTS
Docker version 23.0.1
v2.6.5
I have configured my cluster to run a single node, as specified here, and then I followed the advanced setup instructions to run rancher/rancher
and rancher/rancher-agent
on the same node.
Everything boots and runs. I can access all my applications in my cluster from https://homelab.local
and everything loads and runs. My rancher admin UI boots on https://homelab.local:8443/dashboard/home
. The issue is that I cannot manage the cluster at all.
I see these two errors under Cluster Management:
Unsupported Docker version found [23.0.1] on host [192.168.0.75], supported versions are [1.13.x 17.03.x 17.06.x 17.09.x 18.06.x 18.09.x 19.03.x 20.10.x]
and
[Disconnected] Cluster agent is not connected
So it appears that I have inadvertently upgraded Docker and this is breaking my cluster?
When I run kubectl get pods, I get some kind of cert error:
kubectl get nodes
E0326 19:56:23.504726 70231 memcache.go:265] couldn't get current server API group list: Get "https://localhost:8443/api?timeout=32s": x509: certificate signed by unknown authority
E0326 19:56:23.506701 70231 memcache.go:265] couldn't get current server API group list: Get "https://localhost:8443/api?timeout=32s": x509: certificate signed by unknown authority
E0326 19:56:23.508357 70231 memcache.go:265] couldn't get current server API group list: Get "https://localhost:8443/api?timeout=32s": x509: certificate signed by unknown authority
E0326 19:56:23.510425 70231 memcache.go:265] couldn't get current server API group list: Get "https://localhost:8443/api?timeout=32s": x509: certificate signed by unknown authority
E0326 19:56:23.513743 70231 memcache.go:265] couldn't get current server API group list: Get "https://localhost:8443/api?timeout=32s": x509: certificate signed by unknown authority
Unable to connect to the server: x509: certificate signed by unknown authority
How can I get my cluster back to a good state?
I uninstalled the latest Docker with:
sudo apt-get remove docker-ce docker-ce-cli docker-ce-rootless-extras docker-compose-plugin docker-scan-plugin docker-buildx-plugin
And installed Rancher's supported version like this:
curl https://releases.rancher.com/install-docker/20.10.sh | sh
This fixes the issue with the unsupported Docker version, but the rancher-agent image is still not booting up. When I look at the logs of the container, I see this:
time="2023-03-27T03:20:59Z" level=fatal msg="Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get \"https://192.168.0.75:8443\": x509: certificate signed by unknown authority"
Upvotes: 0
Views: 9808
Reputation: 540
The official Kubernetes docs do not apply to rke2 installs. There is no /etc/kubernetes created when you install Rancher rke2.
Upvotes: 0
Reputation: 1676
As explained in this official doc
The following error “Unable to connect to the server: x509: certificate signed by unknown authority" indicates a possible certificate mismatch.
When you run
kubectl get pods
If you are getting this error Unable to connect to the server: x509: certificate signed by unknown authority
To resolve this error try below troubleshooting methods:
1)Verify that the $HOME/.kube/config file contains a valid certificate, and regenerate a certificate if necessary. The certificates in a kubeconfig file are base64 encoded. The base64 --decode command can be used to decode the certificate and openssl x509 -text -noout can be used for viewing the certificate information.
2)Unset the KUBECONFIG environment variable using:
unset KUBECONFIG
Or set it to the default KUBECONFIG location:
export KUBECONFIG=/etc/kubernetes/admin.conf
3)Another workaround is to overwrite the existing kubeconfig for the "admin" user:
mv $HOME/.kube $HOME/.kube.bak mkdir $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
Refer to this official doc for more information.
Upvotes: 0