Pegladon
Pegladon

Reputation: 603

Istio Ingress resulting in "no healthy upstream"

I am using deploying an outward facing service, that is exposed behind a nodeport and then an istio ingress. The deployment is using manual sidecar injection. Once the deployment, nodeport and ingress are running, I can make a request to the istio ingress.

For some unkown reason, the request does not route through to my deployment and instead displays the text "no healthy upstream". Why is this, and what is causing it?

I can see in the http response that the status code is 503 (Service Unavailable) and the server is "envoy". The deployment is functioning as I can map a port forward to it and everything works as expected.

Upvotes: 32

Views: 105399

Answers (6)

Peter L
Peter L

Reputation: 3361

I was getting no healthy upstream because the deployment hosting the endpoint/UI was "unhealthy".

Despite being able to exec to the pod and curl localhost/..., it was only after I got the health checks working again that I was able to reach the UI externally.

The main clue was when I did kubectl get deployments, I saw READY 0/1...

> kubectl get deployments -n tmk
NA                  READY   UP-TO-DATE   AVAILABLE   AGE
dev-my-deployment   0/1     1            0           136d

Upvotes: 1

John Bob Joe
John Bob Joe

Reputation: 31

From my experience, the "no healthy upstream" error can have different causes. Usually, Istio has received ingress traffic that should be forwarded (the client request, or Istio downstream), but the destination is unavailable (istio upstream / kubernetes service). This results in a HTTP 503 "no healthy upstream" error.

1.) Broken Virtualservice definitions If you have a destination in your VirtualService context where the traffic should be routed, ensure this destination exists (in terms of the hostname is correct, or the service is available from this namespace)

2.) ImagePullBack / Terminating / Service is not available

Ensure your destination is available in general. Sometimes no pod is available, so no upstream will be available too.

3.) ServiceEntry - same destination in 2 lists, but lists with different DNS Rules

Check your namespace for ServiceEntry objects with:

kubectl -n <namespace> get serviceentry

If the result has more than one entry (multiple lines in one ServiceEntry object), check if a destination address (e.g. foo.com) is available in various lines. If the same destination address (e.g. foo.com) is available in various lines, ensure that the column "DNS" does not have different resolution settings (e.g. one line uses DNS, the other line has NONE). If yes, this is an indicator that you try to apply different DNS settings to the same destination address.

A solution is:

a) to unify the DNS setting, setting all lines to NONE or DNS, but not to mix it up.

b) Ensure the destination (foo.com) is available in one line, and a collision of different DNS rules does not appear.

a) involves restarting istio-ingressgateway pods (data plane) to make it work.

b) Involves no restart of istio data or istio control plane.

Basically: It helps to check the status between Control Plane (istiod) and DatapPlane (istio-ingressgateway) with

istioctl proxy-status

The output of istioctl proxy-status should ensure that the columns say "SYNC" this ensures that the control plane and Data Plane are synced. If not, you can restart the istio-ingressgateway deployment or the istiod daemonset, to force "fresh" processes.

Further, it helped to run

istioctl analyze -A

to ensure that targets are checked in the VirtualService context and do exist. If a virtual service definition exists with routing definitions whose destination is unavailable, istioctl analyze -A can detect these unavailable destinations.

Furthermore, reading the logfiles of the istiod container helps. The istiod error messages often indicate the context of the error in the routing (which namespace and service or istio setting). You can use the default way with

kubectl -n istio-system logs <nameOfIstioDPod>

Referenes:

Upvotes: 2

Ualter Jr.
Ualter Jr.

Reputation: 2438

Just in case, like me, you get curious... Even though in my scenario it was clear the case of the error...

Error cause: I had two versions of the same service (v1 and v2), and an Istio VirtualService configured with traffic route destination using weights. Then, 95% goes to v1 and 5% goes to v2. As I didn't have the v1 deployed (yet), of course, the error "503 - no healthy upstream" shows up 95% of the requests.

Ok, even so, I knew the problem and how to fix it (just deploy v1), I was wondering... But, how can I have more information about this error? How could I get a deeper analysis of this error to find out what was happening?

This is a way of investigating using the configuration command line utility of Istio, the istioctl:

# 1) Check the proxies status -->
  $ istioctl proxy-status
# Result -->
  NAME                                                   CDS        LDS        EDS        RDS          PILOT                       VERSION
  ...
  teachstore-course-v1-74f965bd84-8lmnf.development      SYNCED     SYNCED     SYNCED     SYNCED       istiod-86798869b8-bqw7c     1.5.0
  ...
  ...

# 2) Get the name outbound from JSON result using the proxy (service with the problem) -->
  $ istioctl proxy-config cluster teachstore-course-v1-74f965bd84-8lmnf.development --fqdn teachstore-student.development.svc.cluster.local -o json
# 2) If you have jq install locally (only what we need, already extracted) -->
  $ istioctl proxy-config cluster teachstore-course-v1-74f965bd84-8lmnf.development --fqdn teachstore-course.development.svc.cluster.local -o json | jq -r .[].name
# Result -->
  outbound|80||teachstore-course.development.svc.cluster.local
  inbound|80|9180-tcp|teachstore-course.development.svc.cluster.local
  outbound|80|v1|teachstore-course.development.svc.cluster.local
  outbound|80|v2|teachstore-course.development.svc.cluster.local

# 3) Check the endpoints of "outbound|80|v2|teachstore-course..." using v1 proxy -->
  $ istioctl proxy-config endpoints teachstore-course-v1-74f965bd84-8lmnf.development --cluster "outbound|80|v2|teachstore-course.development.svc.cluster.local"
# Result (the v2, 5% of the traffic route is ok, there are healthy targets) -->
  ENDPOINT             STATUS      OUTLIER CHECK     CLUSTER
  172.17.0.28:9180     HEALTHY     OK                outbound|80|v2|teachstore-course.development.svc.cluster.local
  172.17.0.29:9180     HEALTHY     OK                outbound|80|v2|teachstore-course.development.svc.cluster.local

# 4) However, for the v1 version "outbound|80|v1|teachstore-course..." -->
$ istioctl proxy-config endpoints teachstore-course-v1-74f965bd84-8lmnf.development --cluster "outbound|80|v1|teachstore-course.development.svc.cluster.local"
  ENDPOINT             STATUS      OUTLIER CHECK     CLUSTER
# Nothing! Emtpy, no Pods, that's explain the "no healthy upstream" 95% of time.

Upvotes: 6

仲夏叶
仲夏叶

Reputation: 31

delete destinationrules.networking.istio.io and recreate the virtualservice.networking.istio.io

[root@10-20-10-110 ~]# curl http://dprovider.example.com:31400/dw/provider/beat
no healthy upstream[root@10-20-10-110 ~]# 
[root@10-20-10-110 ~]# curl http://10.210.11.221:10100/dw/provider/beat
"该服务节点  10.210.11.221  心跳正常!"[root@10-20-10-110 ~]# 
[root@10-20-10-110 ~]# 
[root@10-20-10-110 ~]# cat /home/example_service_yaml/vs/dw-provider-service.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: dw-provider-service
  namespace: example
spec:
  hosts:
  - "dprovider.example.com"
  gateways:
  - example-gateway
  http:
  - route:
    - destination:
        host: dw-provider-service 
        port:
          number: 10100
        subset: "v1-0-0"
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: dw-provider-service
  namespace: example
spec:
  host: dw-provider-service
  subsets:
  - name: "v1-0-0"
    labels:
      version: 1.0.0

[root@10-20-10-110 ~]# vi /home/example_service_yaml/vs/dw-provider-service.yaml 
[root@10-20-10-110 ~]# kubectl -n example get vs -o wide | grep dw                       
dw-collection-service    [example-gateway]   [dw.collection.example.com]                       72d
dw-platform-service      [example-gateway]   [dplatform.example.com]                           81d
dw-provider-service      [example-gateway]   [dprovider.example.com]                           21m
dw-sync-service          [example-gateway]   [dw-sync-service dsync.example.com]               34d
[root@10-20-10-110 ~]# kubectl -n example delete vs dw-provider-service 
virtualservice.networking.istio.io "dw-provider-service" deleted
[root@10-20-10-110 ~]# kubectl -n example delete d dw-provider-service   
daemonsets.apps                       deniers.config.istio.io               deployments.extensions                dogstatsds.config.istio.io            
daemonsets.extensions                 deployments.apps                      destinationrules.networking.istio.io  
[root@10-20-10-110 ~]# kubectl -n example delete destinationrules.networking.istio.io dw-provider-service 
destinationrule.networking.istio.io "dw-provider-service" deleted
[root@10-20-10-110 ~]# kubectl apply -f /home/example_service_yaml/vs/dw-provider-service.yaml 
virtualservice.networking.istio.io/dw-provider-service created
[root@10-20-10-110 ~]# curl http://dprovider.example.com:31400/dw/provider/beat
"该服务节点  10.210.11.221  心跳正常!"[root@10-20-10-110 ~]# 
[root@10-20-10-110 ~]# 

Upvotes: 3

Malathi
Malathi

Reputation: 2195

I faced the issue, when I my pod was in ContainerCreating state. So, it resulted in 503 error. Also as @pegaldon, explained it can also occur due to incorrect route configuration or there are no gateways created by user.

Upvotes: 2

Pegladon
Pegladon

Reputation: 603

Although this is a somewhat general error resulting from a routing issue within an improper Istio setup, I will provide a general solution/piece of advice to anyone coming across the same issue.

In my case the issue was due to incorrect route rule configuration, the Kubernetes native services were functioning however the Istio routing rules were incorrectly configured so Istio could not route from the ingress into the service.

Upvotes: 5

Related Questions