Reputation: 140
I have a GKE cluster with 4 nodes in an instance group. I deployed Ingress and several pods (1 replica only of each pod so they are only on 1 node). I notice on the Google Console (Ingress details page) that all backend services remain Unhealhy although the healthchecks on the running pods are OK and my application is running. To my understanding it says it is unhealthy because out of the 4 nodes, only 1 node is running an instance of a given pod (on the Back-end service details it says "1 of 4 instances healthy"). Am I correct and should I worry and try to fix this? It's bit strange to accept an Unhealthy status when the application is running...
Edit: After further investigation, down to 2 nodes, and activating the healthcheck logs, I can see that the backend service status seems to be the status of the last executed healthcheck. So if it checks last the node that hosts the pod, it is healthy, else it is unhealthy.
GKE version: 1.16.13-gke.1
My ingress definition:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
ingress.gcp.kubernetes.io/pre-shared-cert: mcrt-dc729887-5c67-4388-9327-e4f76baf9eaf
ingress.kubernetes.io/backends: '{"k8s-be-30301--503461913abc33d7":"UNHEALTHY","k8s-be-31206--503461913abc33d7":"HEALTHY","k8s-be-31253--503461913abc33d7":"HEALTHY","k8s-be-31267--503461913abc33d7":"HEALTHY","k8s-be-31432--503461913abc33d7":"UNHEALTHY","k8s-be-32238--503461913abc33d7":"HEALTHY","k8s-be-32577--503461913abc33d7":"UNHEALTHY","k8s-be-32601--503461913abc33d7":"UNHEALTHY"}'
ingress.kubernetes.io/https-forwarding-rule: k8s2-fs-sfdowd2x-city-foobar-cloud-8cfrc00p
ingress.kubernetes.io/https-target-proxy: k8s2-ts-sfdowd2x-city-foobar-cloud-8cfrc00p
ingress.kubernetes.io/ssl-cert: mcrt-dc729887-5c67-4388-9327-e4f76baf9eaf
ingress.kubernetes.io/url-map: k8s2-um-sfdowd2x-city-foobar-cloud-8cfrc00p
kubernetes.io/ingress.allow-http: "false"
kubernetes.io/ingress.global-static-ip-name: city
networking.gke.io/managed-certificates: foobar-cloud
creationTimestamp: "2020-08-06T08:25:18Z"
finalizers:
- networking.gke.io/ingress-finalizer-V2
generation: 1
labels:
app.kubernetes.io/instance: foobar-cloud
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: foobar-cloud
helm.sh/chart: foobar-cloud-0.4.58
name: foobar-cloud
namespace: city
resourceVersion: "37878"
selfLink: /apis/extensions/v1beta1/namespaces/city/ingresses/foobar-cloud
uid: 751f78cf-2344-46e3-b87e-04d6d903acd5
spec:
rules:
- http:
paths:
- backend:
serviceName: foobar-cloud-server
servicePort: 9999
path: /foobar/server
- backend:
serviceName: foobar-cloud-server
servicePort: 9999
path: /foobar/server/*
status:
loadBalancer:
ingress:
- ip: xx.xx.xx.xx
Upvotes: 8
Views: 9965
Reputation: 5426
I have the same issues and I am using ingress NGINX controller instead of the default GKE controller.
Turn out this is due to the ingress NGINX controller not running as DaemonSet in those nodes, wherever the controller is running, the nodes will show OK.
Upvotes: 0
Reputation: 1
I was able to resolve my version of this problem by adding firewall allow rules for the centralized gcp healthcheck subnets.
I had a similar issue where my ILB ingress resource showed unhealthy backends:
Annotations: ingress.kubernetes.io/backends: {"k8s1-abd1234-default-api-5000-abcd1234":"UNHEALTHY"}
The backends are managed in my case by the GKE Ingress operator so you can inspect the backend/healthcheck in the GCP console. Clicking in you can see if the healthchecks are working and if they are configured properly. As far as I can tell, the GCP healthchecks pull their configuration from the livenessProbe
configuration on the associated pod/deployment.
Upvotes: 0
Reputation: 11
In Google Cloud is necessary have a endpoint that return code 200. For C# .Net Core you can see how to do in Health Check After you create the endpoint, you need to configure two things:
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: http-hc-config
spec:
healthCheck:
checkIntervalSec: 60
port: 80
type: HTTP
requestPath: /health
kind: Service
metadata:
name: app-service
annotations:
cloud.google.com/neg: '{"ingress": true}'
cloud.google.com/backend-config: '{"default": "http-hc-config"}'
spec:
selector:
type: app
ports:
- port: 80
protocol: TCP
targetPort: 80
type: NodePort
For me this work.
Upvotes: 1
Reputation: 661
I was having a similar problem: GCP network endpoint saying the backend was unhealthy.
The problem in my case was that my application would not return 200 in /
, because it requires authentication.
Make sure you configure livenessProbe
and readinessProbe
to do an httpGet
to a path that returns 200 OK. In my case:
livenessProbe:
httpGet:
path: /ping
port: 4180
readinessProbe:
httpGet:
path: /ping
port: 4180
More details:
When the Ingress
is created, the controller that tells GCP how to configure the Cloud Loadbalancer copies from the Deployment
spec the info about the probes, and that is what it is used to determine the health of the Google Cloud backend endpoint.
I discovered this because when I deployed my application I had no probes configured. Then I edited the deployment and added both probes but it didn't work. I could see this in the logs of my app:
[2021/11/22 18:38:43] [oauthproxy.go:862] No valid authentication in request. Initiating login.
130.211.1.166:32768 - e8d8b7f9-8cc9-419a-aeb8-898260169a2c - - [2021/11/22 18:38:43] 10.56.2.24 GET - "/" HTTP/1.1 "GoogleHC/1.0" 403 8092 0.000
10.56.2.1:45770 - e7a9d52a-ecbe-4e1c-af69-65ddf432d92c - - [2021/11/22 18:38:50] 10.56.2.24:4180 GET - "/ping" HTTP/1.1 "kube-probe/1.20+" 200 2 0.000
As you can see, there is a request to /
from an agent with code "GoogleHC/1.0". This is what GCP uses to determine if the backend is healthy.
Then there is another request to /ping
from an agent with code kube-probe/1.20+
, that is the readinessProbe
in Kubernetes.
Then I deleted the Ingress
and created it again, and this time it worked:
130.211.1.180:39854 - d069dd2c-6733-4029-8c9b-fa03917ca2a7 - - [2021/11/22 18:57:32] 10.56.2.27 GET - "/ping" HTTP/1.1 "GoogleHC/1.0" 200 2 0.000
10.56.2.1:35598 - 85eeaf1c-a6e6-4cc8-a6ed-931f504f9493 - - [2021/11/22 18:57:36] 10.56.2.27:4180 GET - "/ping" HTTP/1.1 "kube-probe/1.20+" 200 2 0.000
Both agents using the right path for the readiness probe.
Upvotes: 4
Reputation: 1438
Experienced the same issue as @jfc.
I specified livenessProbe
and readinessProbe
in my pod with custom healthcheck path.
It was sufficient to fix kube-probe
healthchecks but not enough to fix GoogleHC
healthchecks. I had to manually configure the healthchek in the GCP console.
Upvotes: 1
Reputation: 140
I finally found out the cause of this.
My services were not mentionning any value for externalTrafficPolicy
so the default value of Cluster
applied.
However, I have a NetworkPolicy defined which goal was to prevent traffic from other namespaces,as described here.
I added the IPs of the load balancers probes as stated in this doc but was missing the allow connections from other node IPs in the cluster.
Upvotes: 0
Reputation: 178
I had a very similar issue. I don't need to share my setup as it's almost identical to the OP's. I'm using the GKE Ingress Controller also like the OP. I had manually added externalTrafficPolicy: Local to the service called by the Ingress Controller backend service and when I changed the externalTrafficPolicy from 'Local' to 'Cluster' (as per dany L above) the Ingress backend service immediately reported healthy.
I removed the 'externalTrafficPolicy:' lines from the called Service and am now set up with a GKE Ingress Controller using conatainer native load balancing with all backend services reporting healthy.
Upvotes: 4
Reputation: 2654
Please check your yaml file for your service. If it shows externalTrafficPolicy: local, then it is expected behavior.
Local means traffic will always go to a pod on the same node, while everything else is dropped. So if your deployment has only 1 replica it is serving, you will only have one healthy instance.
You can easily test that theory, scale up to 2 replicas and observe behavior. I forsee 1 healthy instance if 2nd replica lands on the same node as first replica and 2/4 healthy if 2nd replica lands on a different node. Let me know.
Upvotes: 0