Reputation: 69
I have installed a kubernetes cluster on Azure
with kubespray 2.13.2
.
But after I have installed some pods of my data platform components,
I have noticed that the pods running on the same node cannot access to each other through service.
For example, my presto coordinator has to access hive metastore. Let's see the services in my namespace:
kubectl get svc -n ai-developer
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
metastore ClusterIP 10.233.12.66 <none> 9083/TCP 4h53m
Hive Metastore service is called metastore
, through which my presto coordinator has to access hive metastore pod.
Let's see the following pods in my namespace:
kubectl get po -n ai-developer -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
metastore-5544f95b6b-cqmkx 1/1 Running 0 9h 10.233.69.20 minion-3 <none> <none>
presto-coordinator-796c4c7bcd-7lngs 1/1 Running 0 5h32m 10.233.69.29 minion-3 <none> <none>
presto-worker-0 1/1 Running 0 5h32m 10.233.67.52 minion-1 <none> <none>
presto-worker-1 1/1 Running 0 5h32m 10.233.70.24 minion-4 <none> <none>
presto-worker-2 1/1 Running 0 5h31m 10.233.68.24 minion-2 <none> <none>
presto-worker-3 1/1 Running 0 5h31m 10.233.71.27 minion-0 <none> <none>
Take a look at that the hive metastore pod metastore-5544f95b6b-cqmkx
which is running on the node minion-3
on which presto coordinator pod presto-coordinator-796c4c7bcd-7lngs
also is running.
I have configured hive metastore url of thrift://metastore:9083
to hive properties for hive catalog in presto coordinator.
When the presto pods are running on that same node where hive metastore pod is running, they cannot access to my hive metastore, but the pod running on other node where hive metastore is not running can access to the hive metastore through service
very well.
I have mentioned just one example, but I have experienced several other cases like this example for now.
kubenet
is installed as network plugin in my kubernetes cluster installed with kubespray on azure
:
/usr/local/bin/kubelet --logtostderr=true --v=2 --node-ip=10.240.0.4 --hostname-override=minion-3 --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/etc/kubernetes/kubelet-config.yaml --kubeconfig=/etc/kubernetes/kubelet.conf --pod-infra-container-image=k8s.gcr.io/pause:3.1 --runtime-cgroups=/systemd/system.slice --hairpin-mode=promiscuous-bridge --network-plugin=kubenet --cloud-provider=azure --cloud-config=/etc/kubernetes/cloud_config
Any idea?
Upvotes: 2
Views: 1942
Reputation: 888
I was using flannel
as CNI on Kubernetes v1.30.1 and it turned out that flannel
needs masquerade to be set true
while kube-proxy
default value has masqueradeAll: false
. Changing it to true
and restarting kube-proxy
pods solved the problem (AND I FINALLY GOT TO SOLVE IT AT 4 AM!!!).
the steps:
kubectl -n kube-system edit cm kube-proxy
to set masqueradeAll: true
kubectl -n kube-system delete pod -l k8s-app=kube-proxy
to restart all kube proxy podsUpvotes: 1
Reputation: 1310
In my case it was the br_netfilter
module which did not survive a reboot, so the vxlan it did not work.
Upvotes: 0
Reputation: 69
After I have changed ipvs
of kube proxy mode to iptables
, it works fine!
Upvotes: 0
Reputation: 11
Please check if the iptables Chain FORWARD default policy is ACCEPT . In my case , set the Forward chain default policy from drop to accept, the communitcation between nodes works well.
Upvotes: 1
Reputation: 6765
you might be able to overcome this issue by using the fully qualified name k8s provides you for resolving service ips, as described in the k8s docsenter link description here.
In your case it will probably mean changing your thrift://metastore:9083
property to thrift://metastore.ai-developer.svc.cluster.local
(Assuming, of course, your cluster domain is configured to be cluster.local
)
Upvotes: 0