google-kubernetes-enginekubernetes-ingress

Reputation: 140

GKE Ingress shows unhealthy backend services

I have a GKE cluster with 4 nodes in an instance group. I deployed Ingress and several pods (1 replica only of each pod so they are only on 1 node). I notice on the Google Console (Ingress details page) that all backend services remain Unhealhy although the healthchecks on the running pods are OK and my application is running. To my understanding it says it is unhealthy because out of the 4 nodes, only 1 node is running an instance of a given pod (on the Back-end service details it says "1 of 4 instances healthy"). Am I correct and should I worry and try to fix this? It's bit strange to accept an Unhealthy status when the application is running...

Edit: After further investigation, down to 2 nodes, and activating the healthcheck logs, I can see that the backend service status seems to be the status of the last executed healthcheck. So if it checks last the node that hosts the pod, it is healthy, else it is unhealthy.

GKE version: 1.16.13-gke.1

My ingress definition:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    ingress.gcp.kubernetes.io/pre-shared-cert: mcrt-dc729887-5c67-4388-9327-e4f76baf9eaf
    ingress.kubernetes.io/backends: '{"k8s-be-30301--503461913abc33d7":"UNHEALTHY","k8s-be-31206--503461913abc33d7":"HEALTHY","k8s-be-31253--503461913abc33d7":"HEALTHY","k8s-be-31267--503461913abc33d7":"HEALTHY","k8s-be-31432--503461913abc33d7":"UNHEALTHY","k8s-be-32238--503461913abc33d7":"HEALTHY","k8s-be-32577--503461913abc33d7":"UNHEALTHY","k8s-be-32601--503461913abc33d7":"UNHEALTHY"}'
    ingress.kubernetes.io/https-forwarding-rule: k8s2-fs-sfdowd2x-city-foobar-cloud-8cfrc00p
    ingress.kubernetes.io/https-target-proxy: k8s2-ts-sfdowd2x-city-foobar-cloud-8cfrc00p
    ingress.kubernetes.io/ssl-cert: mcrt-dc729887-5c67-4388-9327-e4f76baf9eaf
    ingress.kubernetes.io/url-map: k8s2-um-sfdowd2x-city-foobar-cloud-8cfrc00p
    kubernetes.io/ingress.allow-http: "false"
    kubernetes.io/ingress.global-static-ip-name: city
    networking.gke.io/managed-certificates: foobar-cloud
  creationTimestamp: "2020-08-06T08:25:18Z"
  finalizers:
  - networking.gke.io/ingress-finalizer-V2
  generation: 1
  labels:
    app.kubernetes.io/instance: foobar-cloud
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: foobar-cloud
    helm.sh/chart: foobar-cloud-0.4.58
  name: foobar-cloud
  namespace: city
  resourceVersion: "37878"
  selfLink: /apis/extensions/v1beta1/namespaces/city/ingresses/foobar-cloud
  uid: 751f78cf-2344-46e3-b87e-04d6d903acd5
spec:
  rules:
  - http:
      paths:
      - backend:
          serviceName: foobar-cloud-server
          servicePort: 9999
        path: /foobar/server
      - backend:
          serviceName: foobar-cloud-server
          servicePort: 9999
        path: /foobar/server/*
status:
  loadBalancer:
    ingress:
    - ip: xx.xx.xx.xx

Upvotes: 8

Answers (8)

Steven Yong

Reputation: 5426

I have the same issues and I am using ingress NGINX controller instead of the default GKE controller.

Turn out this is due to the ingress NGINX controller not running as DaemonSet in those nodes, wherever the controller is running, the nodes will show OK.

Upvotes: 0

nibalizer

Reputation: 1

I was able to resolve my version of this problem by adding firewall allow rules for the centralized gcp healthcheck subnets.

I had a similar issue where my ILB ingress resource showed unhealthy backends:

Annotations:  ingress.kubernetes.io/backends: {"k8s1-abd1234-default-api-5000-abcd1234":"UNHEALTHY"}

The backends are managed in my case by the GKE Ingress operator so you can inspect the backend/healthcheck in the GCP console. Clicking in you can see if the healthchecks are working and if they are configured properly. As far as I can tell, the GCP healthchecks pull their configuration from the livenessProbe configuration on the associated pod/deployment.

Upvotes: 0

William M.

Reputation: 11

In Google Cloud is necessary have a endpoint that return code 200. For C# .Net Core you can see how to do in Health Check After you create the endpoint, you need to configure two things:

Create a BackendConfig to define the url (request path)

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: http-hc-config
spec:
  healthCheck:
    checkIntervalSec: 60
    port: 80
    type: HTTP
    requestPath: /health

Define in annotations in service "cloud.google.com/backend-config"

kind: Service
metadata:
  name: app-service
  annotations: 
    cloud.google.com/neg: '{"ingress": true}'
    cloud.google.com/backend-config: '{"default": "http-hc-config"}'
spec:
  selector:
    type: app
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80  
  type: NodePort

For me this work.

Upvotes: 1

jfc

Reputation: 661

I was having a similar problem: GCP network endpoint saying the backend was unhealthy.

The problem in my case was that my application would not return 200 in /, because it requires authentication.

Make sure you configure livenessProbe and readinessProbe to do an httpGet to a path that returns 200 OK. In my case:

livenessProbe:
    httpGet:
        path: /ping
        port: 4180
readinessProbe:
    httpGet:
        path: /ping
        port: 4180

More details:

When the Ingress is created, the controller that tells GCP how to configure the Cloud Loadbalancer copies from the Deployment spec the info about the probes, and that is what it is used to determine the health of the Google Cloud backend endpoint.

I discovered this because when I deployed my application I had no probes configured. Then I edited the deployment and added both probes but it didn't work. I could see this in the logs of my app:

[2021/11/22 18:38:43] [oauthproxy.go:862] No valid authentication in request. Initiating login.
130.211.1.166:32768 - e8d8b7f9-8cc9-419a-aeb8-898260169a2c - - [2021/11/22 18:38:43] 10.56.2.24 GET - "/" HTTP/1.1 "GoogleHC/1.0" 403 8092 0.000
10.56.2.1:45770 - e7a9d52a-ecbe-4e1c-af69-65ddf432d92c - - [2021/11/22 18:38:50] 10.56.2.24:4180 GET - "/ping" HTTP/1.1 "kube-probe/1.20+" 200 2 0.000

As you can see, there is a request to / from an agent with code "GoogleHC/1.0". This is what GCP uses to determine if the backend is healthy.

Then there is another request to /ping from an agent with code kube-probe/1.20+, that is the readinessProbe in Kubernetes.

Then I deleted the Ingress and created it again, and this time it worked:

130.211.1.180:39854 - d069dd2c-6733-4029-8c9b-fa03917ca2a7 - - [2021/11/22 18:57:32] 10.56.2.27 GET - "/ping" HTTP/1.1 "GoogleHC/1.0" 200 2 0.000
10.56.2.1:35598 - 85eeaf1c-a6e6-4cc8-a6ed-931f504f9493 - - [2021/11/22 18:57:36] 10.56.2.27:4180 GET - "/ping" HTTP/1.1 "kube-probe/1.20+" 200 2 0.000

Both agents using the right path for the readiness probe.

Upvotes: 4

Martin Tovmassian

Reputation: 1438

Experienced the same issue as @jfc.

I specified livenessProbe and readinessProbe in my pod with custom healthcheck path.

It was sufficient to fix kube-probe healthchecks but not enough to fix GoogleHC healthchecks. I had to manually configure the healthchek in the GCP console.

Upvotes: 1

Alain B.

Reputation: 140

I finally found out the cause of this.
My services were not mentionning any value for externalTrafficPolicy so the default value of Cluster applied.
However, I have a NetworkPolicy defined which goal was to prevent traffic from other namespaces,as described here. I added the IPs of the load balancers probes as stated in this doc but was missing the allow connections from other node IPs in the cluster.

Upvotes: 0

cname87

Reputation: 178

I had a very similar issue. I don't need to share my setup as it's almost identical to the OP's. I'm using the GKE Ingress Controller also like the OP. I had manually added externalTrafficPolicy: Local to the service called by the Ingress Controller backend service and when I changed the externalTrafficPolicy from 'Local' to 'Cluster' (as per dany L above) the Ingress backend service immediately reported healthy.

I removed the 'externalTrafficPolicy:' lines from the called Service and am now set up with a GKE Ingress Controller using conatainer native load balancing with all backend services reporting healthy.

Upvotes: 4

dany L

Reputation: 2654

Please check your yaml file for your service. If it shows externalTrafficPolicy: local, then it is expected behavior.

Local means traffic will always go to a pod on the same node, while everything else is dropped. So if your deployment has only 1 replica it is serving, you will only have one healthy instance.

You can easily test that theory, scale up to 2 replicas and observe behavior. I forsee 1 healthy instance if 2nd replica lands on the same node as first replica and 2/4 healthy if 2nd replica lands on a different node. Let me know.

Upvotes: 0

GKE Ingress shows unhealthy backend services

Answers (8)

Related Questions