Reputation: 11
Hi guys I have an 11 nodes Kubernetes cluster with cilium 1.12.1 kubeProxyReplacement=strict
built on bare metal in our data center, but pods on 4 of the nodes(node5-node8) have issues when communicate with other pods or service which not on the same node, other 7 nodes don't have the issue. I can ping to other pods IP, but when telnet the port, packages seems never arrived.
All the 11 nodes installed the same version of OS, same kernel, and the cluster is deployed with Kubespray, I made sure that the 11 nodes had the same software environment as much as possible, (I’m not sure if it has anything to do with the hardware, but the 4 problematic nodes were gigabit NIC servers and the others were all 10 gigabit NICs.)
This is the node list:
❯ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master01 Ready control-plane 39h v1.24.4 10.252.55.22 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
master02 Ready control-plane 39h v1.24.4 10.252.54.44 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
master03 Ready control-plane 39h v1.24.4 10.252.55.39 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node05 Ready <none> 39h v1.24.4 10.252.34.27 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node06 Ready <none> 39h v1.24.4 10.252.33.44 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node07 Ready <none> 39h v1.24.4 10.252.33.52 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node08 Ready <none> 39h v1.24.4 10.252.33.45 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node01 Ready <none> 39h v1.24.4 10.252.144.206 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node02 Ready <none> 39h v1.24.4 10.252.145.13 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node03 Ready <none> 39h v1.24.4 10.252.145.163 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
node04 Ready <none> 39h v1.24.4 10.252.145.226 <none> CentOS Linux 7 (Core) 5.10.0-1.0.0.17 containerd://1.6.8
This is what happens in pod on node5 when communicate with nginx pods running on master01:
# ping works fine
bash-5.1# ping 10.233.64.103
PING 10.233.64.103 (10.233.64.103) 56(84) bytes of data.
64 bytes from 10.233.64.103: icmp_seq=1 ttl=63 time=0.214 ms
64 bytes from 10.233.64.103: icmp_seq=2 ttl=63 time=0.148 ms
--- 10.233.64.103 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1026ms
rtt min/avg/max/mdev = 0.148/0.181/0.214/0.033 ms
# curl not working
bash-5.1# curl 10.233.64.103
curl: (28) Failed to connect to 10.233.64.103 port 80 after 3069 ms: Operation timed out
# hubble observe logs(hubble observe --to-ip 10.233.64.103 -f):
Sep 6 03:15:16.100: cilium-test/testubuntu-g2gv6 (ID:9268) -> cilium-test/nginx-deployment-bpvnx (ID:4221) to-overlay FORWARDED (ICMPv4 EchoRequest)
Sep 6 03:15:16.100: cilium-test/testubuntu-g2gv6 (ID:9268) -> cilium-test/nginx-deployment-bpvnx (ID:4221) to-endpoint FORWARDED (ICMPv4 EchoRequest)
Sep 6 03:15:22.026: cilium-test/testubuntu-g2gv6:33722 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: SYN)
This is what happens in pod on node4 when communicate with the same nginx pod:
# ping works fine
bash-5.1# ping 10.233.64.103
PING 10.233.64.103 (10.233.64.103) 56(84) bytes of data.
64 bytes from 10.233.64.103: icmp_seq=1 ttl=63 time=2.33 ms
64 bytes from 10.233.64.103: icmp_seq=2 ttl=63 time=2.30 ms
# curl works fine as well
bash-5.1# curl 10.233.64.103
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
# hubble observe logs(hubble observe --to-ip 10.233.64.103 -f):
Sep 6 03:16:24.808: cilium-test/testubuntu-wcwfg (ID:9268) -> cilium-test/nginx-deployment-bpvnx (ID:4221) to-overlay FORWARDED (ICMPv4 EchoRequest)
Sep 6 03:16:24.810: cilium-test/testubuntu-wcwfg (ID:9268) -> cilium-test/nginx-deployment-bpvnx (ID:4221) to-endpoint FORWARDED (ICMPv4 EchoRequest)
Sep 6 03:16:27.043: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: SYN)
Sep 6 03:16:27.045: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: SYN)
Sep 6 03:16:27.045: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: ACK)
Sep 6 03:16:27.045: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: ACK, PSH)
Sep 6 03:16:27.047: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: ACK)
Sep 6 03:16:27.047: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Sep 6 03:16:27.048: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: ACK, FIN)
Sep 6 03:16:27.050: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Sep 6 03:16:27.050: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-overlay FORWARDED (TCP Flags: ACK)
Sep 6 03:16:27.051: cilium-test/testubuntu-wcwfg:57802 (ID:9268) -> cilium-test/nginx-deployment-bpvnx:80 (ID:4221) to-endpoint FORWARDED (TCP Flags: ACK)
This is the cilium-health status
, also shows the port connection issues on the 4 nodes:
❯ kubectl exec -it -n kube-system ds/cilium -- cilium-health status
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), clean-cilium-state (init)
Probe time: 2022-09-06T03:10:24Z
Nodes:
node01 (localhost):
Host connectivity to 10.252.144.206:
ICMP to stack: OK, RTT=341.295µs
HTTP to agent: OK, RTT=100.729µs
Endpoint connectivity to 10.233.67.53:
ICMP to stack: OK, RTT=334.224µs
HTTP to agent: OK, RTT=163.289µs
master01:
Host connectivity to 10.252.55.22:
ICMP to stack: OK, RTT=1.994728ms
HTTP to agent: OK, RTT=1.610932ms
Endpoint connectivity to 10.233.64.235:
ICMP to stack: OK, RTT=2.100332ms
HTTP to agent: OK, RTT=2.489126ms
master02:
Host connectivity to 10.252.54.44:
ICMP to stack: OK, RTT=2.33033ms
HTTP to agent: OK, RTT=2.34166ms
Endpoint connectivity to 10.233.65.225:
ICMP to stack: OK, RTT=2.101561ms
HTTP to agent: OK, RTT=2.067012ms
master03:
Host connectivity to 10.252.55.39:
ICMP to stack: OK, RTT=1.688641ms
HTTP to agent: OK, RTT=1.593428ms
Endpoint connectivity to 10.233.66.74:
ICMP to stack: OK, RTT=2.210915ms
HTTP to agent: OK, RTT=1.725555ms
node05:
Host connectivity to 10.252.34.27:
ICMP to stack: OK, RTT=2.383001ms
HTTP to agent: OK, RTT=2.48362ms
Endpoint connectivity to 10.233.70.87:
ICMP to stack: OK, RTT=2.194843ms
HTTP to agent: Get "http://10.233.70.87:4240/hello": dial tcp 10.233.70.87:4240: connect: connection timed out
node06:
Host connectivity to 10.252.33.44:
ICMP to stack: OK, RTT=2.091932ms
HTTP to agent: OK, RTT=1.724729ms
Endpoint connectivity to 10.233.71.119:
ICMP to stack: OK, RTT=1.984056ms
HTTP to agent: Get "http://10.233.71.119:4240/hello": dial tcp 10.233.71.119:4240: connect: connection timed out
node07:
Host connectivity to 10.252.33.52:
ICMP to stack: OK, RTT=2.055482ms
HTTP to agent: OK, RTT=2.037437ms
Endpoint connectivity to 10.233.72.47:
ICMP to stack: OK, RTT=1.853614ms
HTTP to agent: Get "http://10.233.72.47:4240/hello": dial tcp 10.233.72.47:4240: connect: connection timed out
node08:
Host connectivity to 10.252.33.45:
ICMP to stack: OK, RTT=2.461315ms
HTTP to agent: OK, RTT=2.369003ms
Endpoint connectivity to 10.233.74.247:
ICMP to stack: OK, RTT=2.097029ms
HTTP to agent: Get "http://10.233.74.247:4240/hello": dial tcp 10.233.74.247:4240: connect: connection timed out
node02:
Host connectivity to 10.252.145.13:
ICMP to stack: OK, RTT=372.787µs
HTTP to agent: OK, RTT=168.915µs
Endpoint connectivity to 10.233.73.98:
ICMP to stack: OK, RTT=360.354µs
HTTP to agent: OK, RTT=287.224µs
node03:
Host connectivity to 10.252.145.163:
ICMP to stack: OK, RTT=363.072µs
HTTP to agent: OK, RTT=216.652µs
Endpoint connectivity to 10.233.68.73:
ICMP to stack: OK, RTT=312.153µs
HTTP to agent: OK, RTT=304.981µs
node04:
Host connectivity to 10.252.145.226:
ICMP to stack: OK, RTT=375.121µs
HTTP to agent: OK, RTT=185.484µs
Endpoint connectivity to 10.233.69.140:
ICMP to stack: OK, RTT=403.752µs
HTTP to agent: OK, RTT=277.517µs
Any suggestions on where I should start troubleshooting?
Upvotes: 1
Views: 2840
Reputation: 13133
It's hard to say for sure without at least the full config and a Cilium sysdump, but I suspect the issue is that some of your NIC drivers don't support XDP.
I’m not sure if it has anything to do with the hardware, but the 4 problematic nodes were gigabit NIC servers and the others were all 10 gigabit NICs.
That suggests an issue with the NIC or their drivers. The only feature in Cilium that depends on the NIC driver is XDP Acceleration.
If you have enabled that feature and the four problematic nodes have a NIC driver that doesn't support XDP (or doesn't fully support it), then it could explain why they fail to communicate with other nodes.
Upvotes: 0
Reputation: 292
Since 1.12 version they changed the routing heavily. Try to enable legacy routing.
In the helm_values.yaml
(if you are using helm to deploy) you should add:
bpf:
hostLegacyRouting: true
It configures whether direct routing mode should route traffic via host stack (true) or directly and more efficiently out of BPF (false) if the kernel supports it. The latter has the implication that it will also bypass netfilter in the host namespace.
You can read more about BPF in the official docs. Pay attention to the compatibility of the node OS with BPF
Upvotes: -1