Reputation: 191
Environmental Info: K3s Version: k3s version v1.24.3+k3s1 (990ba0e8) go version go1.18.1
Node(s) CPU architecture, OS, and Version: Five RPI 4s Running Headless 64-bit Raspbian, each with following information Linux 5.15.56-v8+ #1575 SMP PREEMPT Fri Jul 22 20:31:26 BST 2022 aarch64 GNU/Linux
Cluster Configuration: 3 Nodes configured as control plane, 2 Nodes as Worker Nodes
Describe the bug: The Pods: coredns-b96499967-ktgtc, local-path-provisioner-7b7dc8d6f5-5cfds, metrics-server-668d979685-9szb9, traefik-7cd4fcff68-gfmhm, and svclb-traefik-aa9f6b38-j27sw are at status unknown, with 0/1 pods ready. What this means is that the Cluster DNS service does not work and therefore that pods not are not able to resolve internal or external names
Steps To Reproduce:
Expected behavior: The Important pods should be running, wit known status. Additionally, DNS should work, which means that, among other things headless services should work, and pods should be able to resolve hostnames inside and outside the cluster
Actual behavior: DNS Pods Should be running with a known state, Pods should be able to resolve hostnames inside and outside the cluster, and headless services should be able to work
Additional context / logs:
kubectl -n kube-system get configmap coredns -o go-template={{.data.Corefile}}
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
hosts /etc/coredns/NodeHosts {
ttl 60
reload 15s
fallthrough
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
import /etc/coredns/custom/*.server
Description of Relevant Pods:
kubectl describe pods --namespace=kube-system
Name: coredns-b96499967-ktgtc
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: master0/192.168.0.68
Start Time: Fri, 05 Aug 2022 16:09:38 +0100
Labels: k8s-app=kube-dns
pod-template-hash=b96499967
Annotations: <none>
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-b96499967
Containers:
coredns:
Container ID: containerd://1a83a59275abdb7b783aa06eb56cb1e5367c1ca196598851c2b7d5154c0a4bb9
Image: rancher/mirrored-coredns-coredns:1.9.1
Image ID: docker.io/rancher/mirrored-coredns-coredns@sha256:35e38f3165a19cb18c65d83334c13d61db6b24905f45640aa8c2d2a6f55ebcb0
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Fri, 05 Aug 2022 19:19:19 +0100
Finished: Fri, 05 Aug 2022 19:20:29 +0100
Ready: False
Restart Count: 8
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/etc/coredns/custom from custom-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zbbxf (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
custom-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns-custom
Optional: true
kube-api-access-zbbxf:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 41d (x419 over 41d) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 64m (x11421 over 42h) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 2m24s (x139 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
Name: metrics-server-668d979685-9szb9
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: master0/192.168.0.68
Start Time: Fri, 05 Aug 2022 16:09:38 +0100
Labels: k8s-app=metrics-server
pod-template-hash=668d979685
Annotations: <none>
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/metrics-server-668d979685
Containers:
metrics-server:
Container ID: containerd://cd02643f7d7bc78ea98abdec20558626cfac39f70e1127b2281342dd00905e44
Image: rancher/mirrored-metrics-server:v0.5.2
Image ID: docker.io/rancher/mirrored-metrics-server@sha256:48ecad4fe641a09fa4459f93c7ad29d4916f6b9cf7e934d548f1d8eff96e2f35
Port: 4443/TCP
Host Port: 0/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--kubelet-use-node-status-port
--metric-resolution=15s
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Fri, 05 Aug 2022 19:19:19 +0100
Finished: Fri, 05 Aug 2022 19:20:29 +0100
Ready: False
Restart Count: 8
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get https://:https/livez delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/readyz delay=0s timeout=1s period=2s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-djqgk (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-djqgk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 41d (x418 over 41d) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 64m (x11427 over 42h) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 2m27s (x141 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
Name: traefik-7cd4fcff68-gfmhm
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: master0/192.168.0.68
Start Time: Fri, 05 Aug 2022 16:10:43 +0100
Labels: app.kubernetes.io/instance=traefik
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=traefik
helm.sh/chart=traefik-10.19.300
pod-template-hash=7cd4fcff68
Annotations: prometheus.io/path: /metrics
prometheus.io/port: 9100
prometheus.io/scrape: true
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/traefik-7cd4fcff68
Containers:
traefik:
Container ID: containerd://779a1596fb204a7577acda97e9fb3f4c5728cf1655071d8e5faad6a8d407d217
Image: rancher/mirrored-library-traefik:2.6.2
Image ID: docker.io/rancher/mirrored-library-traefik@sha256:ad2226527eea71b7591d5e9dcc0bffd0e71b2235420c34f358de6db6d529561f
Ports: 9100/TCP, 9000/TCP, 8000/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
--global.checknewversion
--global.sendanonymoususage
--entrypoints.metrics.address=:9100/tcp
--entrypoints.traefik.address=:9000/tcp
--entrypoints.web.address=:8000/tcp
--entrypoints.websecure.address=:8443/tcp
--api.dashboard=true
--ping=true
--metrics.prometheus=true
--metrics.prometheus.entrypoint=metrics
--providers.kubernetescrd
--providers.kubernetesingress
--providers.kubernetesingress.ingressendpoint.publishedservice=kube-system/traefik
--entrypoints.websecure.http.tls=true
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Fri, 05 Aug 2022 19:19:19 +0100
Finished: Fri, 05 Aug 2022 19:20:29 +0100
Ready: False
Restart Count: 8
Liveness: http-get http://:9000/ping delay=10s timeout=2s period=10s #success=1 #failure=3
Readiness: http-get http://:9000/ping delay=10s timeout=2s period=10s #success=1 #failure=1
Environment: <none>
Mounts:
/data from data (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jw4qc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-jw4qc:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 41d (x415 over 41d) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 64m (x11418 over 42h) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 2m30s (x141 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
Upvotes: 0
Views: 566
Reputation: 191
The Solution that I found to resolve the problem - at least for now, was to manually restart all of the kube-system deployments found using the command deployments
kubectl get deployments --namespace=kube-system
If all of them are similarly not ready they can be restarted using the command
kubectl -n kube-system rollout restart <deployment>
Specifically, coredns, local-path-provisioner, metrics-server, and traefik deployments all needed to be restarted
Upvotes: 0