Stuart Grimshaw
Stuart Grimshaw

Reputation: 1571

Upstream timed out (110: Connection timed out) on Kubernetes Ingress

I've set up my Kubernetes cluster, and as part of that set up have set up an ingress rule to forward traffic to a web server.

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: alpha-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    certmanager.k8s.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
    - hosts:
        - alpha.example.com
      secretName: letsencrypt-prod
  rules:
    - host: alpha.example.com
      http:
        paths:
          - backend:
              serviceName: web
              servicePort: 80

Eventually the browser times out with a 504 error and in the Ingress log I see

2019/01/27 23:45:38 [error] 41#41: *4943 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.131.24.163, server: alpha.example.com, request: "GET / HTTP/2.0", upstream: "http://10.244.93.12:80/", host: "alpha.example.com"

I don't have any services on that IP address ...

╰─$ kgs --all-namespaces                                                                                                                                                                                                                                                  130 ↵
NAMESPACE       NAME                            TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE
default         database                        ClusterIP      10.245.181.187   <none>           5432/TCP                     4d8h
default         kubernetes                      ClusterIP      10.245.0.1       <none>           443/TCP                      9d
default         user-api                        ClusterIP      10.245.41.8      <none>           9000/TCP                     4d8h
default         web                             ClusterIP      10.245.145.213   <none>           80/TCP,443/TCP               34h
ingress-nginx   ingress-nginx                   LoadBalancer   10.245.25.107    <external-ip>   80:31680/TCP,443:32324/TCP   50m
kube-system     grafana                         ClusterIP      10.245.81.91     <none>           80/TCP                       6d1h
kube-system     kube-dns                        ClusterIP      10.245.0.10      <none>           53/UDP,53/TCP,9153/TCP       9d
kube-system     prometheus-alertmanager         ClusterIP      10.245.228.165   <none>           80/TCP                       6d2h
kube-system     prometheus-kube-state-metrics   ClusterIP      None             <none>           80/TCP                       6d2h
kube-system     prometheus-node-exporter        ClusterIP      None             <none>           9100/TCP                     6d2h
kube-system     prometheus-pushgateway          ClusterIP      10.245.147.195   <none>           9091/TCP                     6d2h
kube-system     prometheus-server               ClusterIP      10.245.202.186   <none>           80/TCP                       6d2h
kube-system     tiller-deploy                   ClusterIP      10.245.11.85     <none>           44134/TCP                    9d

If I view the resolv.conf file on the ingress pod, it returns what it should ...

╰─$ keti -n ingress-nginx nginx-ingress-controller-c595c6896-klw25 -- cat /etc/resolv.conf                                                                                                                                                                                130 ↵
nameserver 10.245.0.10
search ingress-nginx.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

dig/nslookup/host aren't available on that container, but if I create a simple busybox instance it gets the right IP with that same config:

╰─$ keti busybox -- nslookup web
Server:    10.245.0.10
Address 1: 10.245.0.10 kube-dns.kube-system.svc.cluster.local

Name:      web
Address 1: 10.245.145.213 web.default.svc.cluster.local

Can anyone give me any ideas what to try next?

Update #1

Here is the config for web, as requested in the comments. I'm also investigating why I can't directly wget anything from web using a busybox inside the cluster.

apiVersion: v1
kind: Service
metadata:
  labels:
    io.kompose.service: web
    app: web
  name: web
spec:
  ports:
  - name: "80"
    port: 80
    targetPort: 80
  - name: "443"
    port: 443
    targetPort: 443
  selector:
    io.kompose.service: web
status:
  loadBalancer: {}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: web
  name: web
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        io.kompose.service: web
        app: web
    spec:
      containers:
      - image: <private docker repo>
        imagePullPolicy: IfNotPresent
        name: web
        resources: {}
      imagePullSecrets:
      - name: gcr
status: {}

Update 2

As per Michael's comment below, the IP address that it has resolved for web is one of it's endpoints:

╰─$ k get endpoints web                                                                                                                                                                                                                                                   130 ↵
NAME      ENDPOINTS                          AGE
web       10.244.93.12:443,10.244.93.12:80   2d

Upvotes: 5

Views: 11632

Answers (1)

Stuart Grimshaw
Stuart Grimshaw

Reputation: 1571

So, this all boiled down to the php-fpm service not having any endpoints, because I'd misconfigured the service selector!

Some of the more eagle eyed readers might have spotted that my config began life as a conversion from a docker-compose config file (my dev environment), and I've built on it from there.

The problem came because I changed the labels & selector for the deployment, but not the service itself.

apiVersion: v1
kind: Service
metadata:
  name: user-api
  labels:
    io.kompose.service: user-api
    app: user-api
spec:
  ports:
    - name: "9000"
      port: 9000
      targetPort: 9000
  selector:
    io.kompose.service: user-api
status:
  loadBalancer: {}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: user-api
  name: user-api
spec:
  replicas: 1
  selector:
    matchLabels:
      app: user-api
  template:
    metadata:
      labels:
        app: user-api
    spec:
... etc

You can see I was still using the old selector that kompose created for me, io.kompose.service: user-api instead of the newer app: user-api

I followed the advice from @coderanger, while the nginx service was responding, the php-fpm one wasn't.

A quick look at the documentation for Connecting Applications With Services says :

As mentioned previously, a Service is backed by a group of Pods. These Pods are exposed through endpoints. The Service’s selector will be evaluated continuously and the results will be POSTed to an Endpoints object also named my-nginx.

When I checked the selector of both the service & deployment template I saw they were different, now they match and everything works as expected.

Upvotes: 2

Related Questions