azurepancake
azurepancake

Reputation: 861

Kubernetes: Pods Can't Resolve Hostnames

I am encountering an issue with Kubernetes where my pods cannot resolve hostnames (such as google.com or kubernetes.default).

I currently have 1 master and 1 node running on two CentOS7 instances in OpenStack. I deployed using kubeadm.

Here are the versions installed:

kubeadm-1.7.3-1.x86_64
kubectl-1.7.3-1.x86_64
kubelet-1.7.3-1.x86_64
kubernetes-cni-0.5.1-0.x86_64

The below outlines some verification steps to maybe give some insight into my problem.

I define a busybox pod:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
  restartPolicy: Always

And then create the pod:

$ kubectl create -f busybox.yaml

Try to perform a DNS lookup of name google.com:

$ kubectl exec -ti busybox -- nslookup google.com
Server:    10.96.0.10
Address 1: 10.96.0.10
nslookup: can't resolve 'google.com'

Try to perform a DNS lookup of the name kubernetes.default:

$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10
nslookup: can't resolve 'kubernetes.default'

Check if my DNS pod is running:

$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                        READY     STATUS    RESTARTS   AGE
kube-dns-2425271678-k1nft   3/3       Running   9          5d

Check if my DNS service is up:

$ kubectl get svc --namespace=kube-system
NAME       CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
kube-dns   10.96.0.10   <none>        53/UDP,53/TCP   5d

Check if DNS endpoints are exposed:

$ kubectl get ep kube-dns --namespace=kube-system
NAME       ENDPOINTS                     AGE
kube-dns   10.244.0.5:53,10.244.0.5:53   5d

Check the contents of /etc/resolv.conf in my container:

$ kubectl exec -ti busybox -- cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

If I understand correctly, the Kubernetes documentation states that my pods should inherit the DNS configurations of the node (or master?). However, even with just one line in it (nameserver 10.92.128.40), I receive the below warning when spinning up a pod:

Search Line limits were exceeded, some DNS names were omitted, and the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local mydomain.net anotherdomain.net yetanotherdomain.net

I understand there exists a known issue where only so many items can be listed in /etc/resolv.conf. However, where would the above search line and nameserver in my container be generated from?

Finally here are the logs from the kube-dns container:

$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c kubedns
I0817 20:54:58.445280       1 dns.go:48] version: 1.14.3-4-gee838f6
I0817 20:54:58.452551       1 server.go:70] Using configuration read from directory: /kube-dns-config with period 10s
I0817 20:54:58.452616       1 server.go:113] FLAG: --alsologtostderr="false"
I0817 20:54:58.452628       1 server.go:113] FLAG: --config-dir="/kube-dns-config"
I0817 20:54:58.452638       1 server.go:113] FLAG: --config-map=""
I0817 20:54:58.452643       1 server.go:113] FLAG: --config-map-namespace="kube-system"
I0817 20:54:58.452650       1 server.go:113] FLAG: --config-period="10s"
I0817 20:54:58.452659       1 server.go:113] FLAG: --dns-bind-address="0.0.0.0"
I0817 20:54:58.452665       1 server.go:113] FLAG: --dns-port="10053"
I0817 20:54:58.452674       1 server.go:113] FLAG: --domain="cluster.local."
I0817 20:54:58.452683       1 server.go:113] FLAG: --federations=""
I0817 20:54:58.452692       1 server.go:113] FLAG: --healthz-port="8081"
I0817 20:54:58.452698       1 server.go:113] FLAG: --initial-sync-timeout="1m0s"
I0817 20:54:58.452704       1 server.go:113] FLAG: --kube-master-url=""
I0817 20:54:58.452713       1 server.go:113] FLAG: --kubecfg-file=""
I0817 20:54:58.452718       1 server.go:113] FLAG: --log-backtrace-at=":0"
I0817 20:54:58.452727       1 server.go:113] FLAG: --log-dir=""
I0817 20:54:58.452734       1 server.go:113] FLAG: --log-flush-frequency="5s"
I0817 20:54:58.452741       1 server.go:113] FLAG: --logtostderr="true"
I0817 20:54:58.452746       1 server.go:113] FLAG: --nameservers=""
I0817 20:54:58.452752       1 server.go:113] FLAG: --stderrthreshold="2"
I0817 20:54:58.452759       1 server.go:113] FLAG: --v="2"
I0817 20:54:58.452765       1 server.go:113] FLAG: --version="false"
I0817 20:54:58.452775       1 server.go:113] FLAG: --vmodule=""
I0817 20:54:58.452856       1 server.go:176] Starting SkyDNS server (0.0.0.0:10053)
I0817 20:54:58.453680       1 server.go:198] Skydns metrics enabled (/metrics:10055)
I0817 20:54:58.453692       1 dns.go:147] Starting endpointsController
I0817 20:54:58.453699       1 dns.go:150] Starting serviceController
I0817 20:54:58.453841       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0817 20:54:58.453852       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0817 20:54:58.964468       1 dns.go:171] Initialized services and endpoints from apiserver
I0817 20:54:58.964523       1 server.go:129] Setting up Healthz Handler (/readiness)
I0817 20:54:58.964536       1 server.go:134] Setting up cache handler (/cache)
I0817 20:54:58.964545       1 server.go:120] Status HTTP port 8081

The dnsmasq container. Disregard that it found several more nameservers than just the one I said was in my resolv.conf, as I did have more in their originally. I attempted to simply it by removing the extras:

$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c dnsmasq
I0817 20:55:03.295826       1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0817 20:55:03.298134       1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0817 20:55:03.731577       1 nanny.go:111] 
W0817 20:55:03.731609       1 nanny.go:112] Got EOF from stdout
I0817 20:55:03.731642       1 nanny.go:108] dnsmasq[9]: started, version 2.76 cachesize 1000
I0817 20:55:03.731656       1 nanny.go:108] dnsmasq[9]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0817 20:55:03.731681       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I0817 20:55:03.731689       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I0817 20:55:03.731695       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain cluster.local 
I0817 20:55:03.731704       1 nanny.go:108] dnsmasq[9]: reading /etc/resolv.conf
I0817 20:55:03.731710       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
I0817 20:55:03.731717       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
I0817 20:55:03.731723       1 nanny.go:108] dnsmasq[9]: using nameserver 127.0.0.1#10053 for domain cluster.local 
I0817 20:55:03.731729       1 nanny.go:108] dnsmasq[9]: using nameserver 10.92.128.40#53
I0817 20:55:03.731735       1 nanny.go:108] dnsmasq[9]: using nameserver 10.92.128.41#53
I0817 20:55:03.731741       1 nanny.go:108] dnsmasq[9]: using nameserver 10.95.207.66#53
I0817 20:55:03.731747       1 nanny.go:108] dnsmasq[9]: read /etc/hosts - 7 addresses

And the sidecar container:

$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c sidecar
ERROR: logging before flag.Parse: I0817 20:55:04.488391       1 main.go:48] Version v1.14.3-4-gee838f6
ERROR: logging before flag.Parse: I0817 20:55:04.488612       1 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})
ERROR: logging before flag.Parse: I0817 20:55:04.488667       1 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}
ERROR: logging before flag.Parse: I0817 20:55:04.488766       1 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}

I have mostly been reading the documentation provided here. Any direction, insight, or things to try would be much appreciated.

Upvotes: 53

Views: 137787

Answers (11)

Giovani Meza
Giovani Meza

Reputation: 1

Unable to resolve cluster.local or external domains in Kubernetes

I solved my problem editing the CoreDNS configmap. I had to comment the forward section in order to resolve service type domain names in the cluster (cluster.local) In addition, I had to add a plugin for my external domain to resolve hostnames in my network, I'm not currently using internet services, might add it later.

  1. OS: RHEL9_4
  2. CNI and pod network: bridge type 172.20.0.0/16
  3. dnsDomain: 10.96.0.10 #default
  4. Kubernetes: v1.31.2

My final coreDNS configmap

kubectl -n kube-system edit configmap coredns
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        Prometheus :9153
        ## Commented lines
        #forward . /etc/resolv.conf {
        #   max_concurrent 1000
        #}
        cache 30
        loop
        reload
        loadbalance
    }
    # new section added for external hosts
    super.domain.local:53 {
        errors
        cache 30
        forward . 192.168.0.20 # here goes your real DNS server for super.domain.local
    }

Upvotes: 0

Gaurav Toshniwal
Gaurav Toshniwal

Reputation: 3722

In my case the EKS nodes' security group was not allowing DNS traffic (port 53). Enabled that and the DNS resolution started working fine.

Upvotes: 0

Gareth Swaffield
Gareth Swaffield

Reputation: 1

We have just had a similar issue on an old Kubernetes cluster (v1.20.5+k3s1) using k3s agent, even though the node was able to resolve DNS none of the pods could. After much searching we found disabling IPV6 on the node fixed it.

Upvotes: 0

runout
runout

Reputation: 81

My case was the simpliest of all, i guess: backend init container could not reach postgress pod by its hostname, because the pod hostname changed when i repacked it with Helm. In other words: hostname i was looking for was wrong.

Some details:

  • I configured an initContainer within backend pod to check DB availability before starting backend app. That worked fine:
    ...
        initContainers:
        - name: wait-for-db
            image: postgres:13-alpine
            command: [ "sh", "-c", "until pg_isready -h db -p 5432 -U postgres:postgres; do echo 'not yet'; sleep 2; done" ]
    ...
    
  • Then i repacked my app with DB in a Helm chart within different pods, so the template for backend looked like this:
    ...
        initContainers:
        - name: wait-for-db
            image: {{ $db_info.image }}:{{ $db_info.version }} 
            command: [ "sh", "-c", "until pg_isready -h db -p {{ (first $db_info.service.ports).port }} -U postgres:postgres; do echo 'not yet'; sleep 2; done" ]
    ...
    
  • The only problem was, that Helm adds chart name to pod name, so the name of my DB pod changed from db-0 to myfancyapp-db-0, and init container couldn't reach it.
  • The solution was to add Release.name to database hostname in the template, so it would look like this:
    ...
        initContainers:
        - name: wait-for-db
            image: {{ $db_info.image }}:{{ $db_info.version }} 
            command: [ "sh", "-c", "until pg_isready -h {{ .Release.Name }}-db -p {{ (first $db_info.service.ports).port }} -U postgres:postgres; do echo 'not yet'; sleep 2; done" ]
    ...
    
    Notice the change -h db to -h {{ .Release.Name }}-db

Thanks to other people in the topic: they mentioned, that it could be something with hostname resolving, that gave me a clue, that the problem could be with the hostname itself. And the thing with Helm might me not obvious whe you are doing your first steps with Kuber/Helm, like myself.

Upvotes: 0

Yaroslav Fedorov
Yaroslav Fedorov

Reputation: 361

I faced a similar problem when I raised a cluster on Virtual box. It turned out that my flannel looked at interface 10.0.2.15

kubectl get pod --namespace kube-system -l app=flannel

NAME                    READY   STATUS     RESTARTS   AGE
kube-flannel-ds-5dxdm   1/1     Running    0          10s
kube-flannel-ds-7z6jt   1/1     Running    0          6s
kube-flannel-ds-vqwrl   1/1     Running    0          3s

and than...

kubectl logs --namespace kube-system kube-flannel-ds-5dxdm -c kube-flannel

I0622 17:53:13.690431       1 main.go:463] Found network config - Backend type: vxlan
I0622 17:53:13.690716       1 match.go:248] Using interface with name enp0s3 and address 10.0.2.15
I0622 17:53:13.690734       1 match.go:270] Defaulting external address to interface address (10.0.2.15)

I added to args --iface=enp0s8

kubectl edit DaemonSet/kube-flannel-ds --namespace kube-system


  containers:
  - name: kube-flannel
    image: quay.io/coreos/flannel:v0.10.0-amd64
    command:
    - /opt/bin/flanneld
    args:
    - --ip-masq
    - --kube-subnet-mgr
    - --iface=enp0s8

and this threads helped me found a solution: configuring flannel to use a non default interface in kubernetes https://github.com/flannel-io/flannel/blob/master/Documentation/troubleshooting.md

and after that coredns works fine

kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

and

kubectl logs --namespace=kube-system -l k8s-app=kube-dns
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.6
linux/amd64, go1.17.1, 13a9191
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/reload: Running configuration MD5 = 3d3f6363f05ccd60e0f885f0eca6c5ff
[INFO] Reloading complete
[INFO] 127.0.0.1:38619 - 51020 "HINFO IN 2350959537417504421.4590630780106405557. udp 57 false 512" NOERROR qr,rd,ra 132 0.055869098s
[INFO] 10.244.2.9:38352 - 33723 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd 155 0.000133217s
[INFO] 10.244.2.9:34998 - 21047 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd 106 0.000088032s
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.6

Upvotes: 0

ow-me
ow-me

Reputation: 161

I add my solution even if the question is quite old. I had the same problem, but in this case the public DNS servers were unreachable due to network policies in the firewall. For solving that I edited the config map used by coredns

kubectl -n kube-system edit configmaps coredns -o yaml

Then I changed the forward option putting inside a public if of a firewall allowed DNS.

Then I restarted the DNS service.

kubectl -n kube-system rollout restart deployment coredns

Upvotes: 1

jhonsfran
jhonsfran

Reputation: 31

I used kubectl -n kube-system rollout restart deployment coredns to fix the problem, but the next problem is that each time a new node is added to the cluster I have to restart coredns.

Upvotes: 2

Alejandro703
Alejandro703

Reputation: 1358

I had a similar problem. Restarting the coredns deployment solved it for me:

kubectl -n kube-system rollout restart deployment coredns

Upvotes: 115

atealxt
atealxt

Reputation: 336

Check coredns pods log, if you see errors like:

# kubectl logs --namespace=kube-system coredns-XXX
  ...
  [ERROR] plugin/errors ... HINFO: read udp ... read: no route to host

Then make sure firewalld masquerade is enabled on the host:

# firewall-cmd --list-all
  ... 
  masquerade: yes

Enable if it's "no":
# firewall-cmd --add-masquerade --permanent
# firewall-cmd --reload

*You may need to restart/reboot after this

Upvotes: 12

gzc
gzc

Reputation: 8639

Encountered the same issue. I followed this doc dns-debugging-resolution and checked DNS related pods, services, endpoints, all was running without error messages. Finally, I found my calico service was dead. After I started calico service and waited several minutes, It worked.

Upvotes: 8

Javier Salmeron
Javier Salmeron

Reputation: 8835

Some ideas come to mind:

Upvotes: 5

Related Questions