Ben Straub
Ben Straub

Reputation: 5776

Kubernetes: slow DNS

I have a Kubernetes cluster that was initialized using the kube-up.sh script inside AWS, and occasionally there's a very slow DNS lookup when finding one service from inside another pod. Here's the basic picture:

    (browser)
        |
        V
      (ELB)
        |
        V
(front-end service)
        |
        V
  (front-end pod)
        |
        V
 (back-end service)
        |
        V
  (back-end pod)
        |
        V
    (database)

I have timing logging installed at the front-end and back-end levels, and their numbers are wildly divergent for some requests. Occasionally we'll see a request that the FE nginx logging says takes 8.3 seconds, but the back-end gunicorn process says takes 30ms.

I can exec into the FE pod and do a curl to the backend endpoint to get timing data according to the example in this blog post, and it looks like this:

        time_namelookup:  3.513
           time_connect:  3.513
        time_appconnect:  0.000
       time_pretransfer:  3.513
          time_redirect:  0.000
     time_starttransfer:  3.520
                        ----------
             time_total:  3.520

So the slowness seems to be coming from DNS. We have a separate cluster set up for staging, and this sort of thing doesn't seem to be happening there, so I'm not sure what to make of it. Most requests happen in a reasonable amount of time, less than 50ms, but every tenth one or so takes multiple seconds to resolve.

I found this thread that made it sound like SkyDNS's use of etcd might be the problem, but I'm not sure how to verify that or fix it. And this is happening way too often to be periodic missing configuration values (our traffic isn't that high).

Upvotes: 3

Views: 4862

Answers (2)

Pedro Marques
Pedro Marques

Reputation: 2682

By default, kubernetes configures the pods to use both skydns (to resolve service names) as well as the resolver of the underlying infrastructure (to resolve external requests). The resolver library inside the docker container will then send requests to skydns or the external resolver in a round robin way.. it also tries to generates requests by first including the full name (e.g. service.namespace.svc.domain) and then trimming the name (e.g. service.namespace.svc; service.namespace). This can result in longer timeouts if the first request is sent to the wrong server.

In case you don't care about the external resolver, you can override the resolution behaviour with the kubelet flag "--resolv-conf" which allows you to specify an alternate set of external resolvers (or none).

Upvotes: 2

Brendan Burns
Brendan Burns

Reputation: 734

There was a bug that was fixed here (https://github.com/kubernetes/kubernetes/pull/13345) that has been shown to cause this problem in Kubernetes clusters 1.0.5 and older. The problem is fixed in the 1.0.6 release.

Upvotes: 4

Related Questions