RommelTJ
RommelTJ

Reputation: 741

Rancher and Kubernetes, Unable to connect to the server: x509: certificate signed by unknown authority

I'm a little confused why my rancher-agent is no longer able to connect to the cluster server. This was working for me for a long time, but it appears to have broken on its own. DNS and networking confuses me.

My setup:

I have configured my cluster to run a single node, as specified here, and then I followed the advanced setup instructions to run rancher/rancher and rancher/rancher-agent on the same node.

The issue

Everything boots and runs. I can access all my applications in my cluster from https://homelab.local and everything loads and runs. My rancher admin UI boots on https://homelab.local:8443/dashboard/home. The issue is that I cannot manage the cluster at all.

I see these two errors under Cluster Management: Unsupported Docker version found [23.0.1] on host [192.168.0.75], supported versions are [1.13.x 17.03.x 17.06.x 17.09.x 18.06.x 18.09.x 19.03.x 20.10.x]
and
[Disconnected] Cluster agent is not connected

So it appears that I have inadvertently upgraded Docker and this is breaking my cluster?

When I run kubectl get pods, I get some kind of cert error:

kubectl get nodes
E0326 19:56:23.504726   70231 memcache.go:265] couldn't get current server API group list: Get "https://localhost:8443/api?timeout=32s": x509: certificate signed by unknown authority
E0326 19:56:23.506701   70231 memcache.go:265] couldn't get current server API group list: Get "https://localhost:8443/api?timeout=32s": x509: certificate signed by unknown authority
E0326 19:56:23.508357   70231 memcache.go:265] couldn't get current server API group list: Get "https://localhost:8443/api?timeout=32s": x509: certificate signed by unknown authority
E0326 19:56:23.510425   70231 memcache.go:265] couldn't get current server API group list: Get "https://localhost:8443/api?timeout=32s": x509: certificate signed by unknown authority
E0326 19:56:23.513743   70231 memcache.go:265] couldn't get current server API group list: Get "https://localhost:8443/api?timeout=32s": x509: certificate signed by unknown authority
Unable to connect to the server: x509: certificate signed by unknown authority

How can I get my cluster back to a good state?

Update

I uninstalled the latest Docker with:
sudo apt-get remove docker-ce docker-ce-cli docker-ce-rootless-extras docker-compose-plugin docker-scan-plugin docker-buildx-plugin And installed Rancher's supported version like this:
curl https://releases.rancher.com/install-docker/20.10.sh | sh

This fixes the issue with the unsupported Docker version, but the rancher-agent image is still not booting up. When I look at the logs of the container, I see this:

time="2023-03-27T03:20:59Z" level=fatal msg="Certificate chain is not complete, please check if all needed intermediate certificates are included in the server certificate (in the correct order) and if the cacerts setting in Rancher either contains the correct CA certificate (in the case of using self signed certificates) or is empty (in the case of using a certificate signed by a recognized CA). Certificate information is displayed above. error: Get \"https://192.168.0.75:8443\": x509: certificate signed by unknown authority"

Upvotes: 0

Views: 9808

Answers (2)

mr.zog
mr.zog

Reputation: 540

The official Kubernetes docs do not apply to rke2 installs. There is no /etc/kubernetes created when you install Rancher rke2.

Upvotes: 0

Sai Chandini Routhu
Sai Chandini Routhu

Reputation: 1676

As explained in this official doc

The following error “Unable to connect to the server: x509: certificate signed by unknown authority" indicates a possible certificate mismatch.

When you run

kubectl get pods

If you are getting this error Unable to connect to the server: x509: certificate signed by unknown authority

To resolve this error try below troubleshooting methods:

1)Verify that the $HOME/.kube/config file contains a valid certificate, and regenerate a certificate if necessary. The certificates in a kubeconfig file are base64 encoded. The base64 --decode command can be used to decode the certificate and openssl x509 -text -noout can be used for viewing the certificate information.

2)Unset the KUBECONFIG environment variable using:

unset KUBECONFIG

Or set it to the default KUBECONFIG location:

export KUBECONFIG=/etc/kubernetes/admin.conf

3)Another workaround is to overwrite the existing kubeconfig for the "admin" user:

    mv  $HOME/.kube $HOME/.kube.bak
    
    mkdir $HOME/.kube
    
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    
    sudo chown $(id -u):$(id -g) $HOME/.kube/config

Refer to this official doc for more information.

Upvotes: 0

Related Questions