Reputation: 603
I am using deploying an outward facing service, that is exposed behind a nodeport and then an istio ingress. The deployment is using manual sidecar injection. Once the deployment, nodeport and ingress are running, I can make a request to the istio ingress.
For some unkown reason, the request does not route through to my deployment and instead displays the text "no healthy upstream". Why is this, and what is causing it?
I can see in the http response that the status code is 503 (Service Unavailable) and the server is "envoy". The deployment is functioning as I can map a port forward to it and everything works as expected.
Upvotes: 32
Views: 105399
Reputation: 3361
I was getting no healthy upstream
because the deployment hosting the endpoint/UI was "unhealthy".
Despite being able to exec
to the pod and curl localhost/...
, it was only after I got the health checks working again that I was able to reach the UI externally.
The main clue was when I did kubectl get deployments
, I saw READY 0/1
...
> kubectl get deployments -n tmk
NA READY UP-TO-DATE AVAILABLE AGE
dev-my-deployment 0/1 1 0 136d
Upvotes: 1
Reputation: 31
From my experience, the "no healthy upstream" error can have different causes. Usually, Istio has received ingress traffic that should be forwarded (the client request, or Istio downstream), but the destination is unavailable (istio upstream / kubernetes service). This results in a HTTP 503 "no healthy upstream" error.
1.) Broken Virtualservice definitions If you have a destination in your VirtualService context where the traffic should be routed, ensure this destination exists (in terms of the hostname is correct, or the service is available from this namespace)
2.) ImagePullBack / Terminating / Service is not available
Ensure your destination is available in general. Sometimes no pod is available, so no upstream will be available too.
3.) ServiceEntry - same destination in 2 lists, but lists with different DNS Rules
Check your namespace for ServiceEntry objects with:
kubectl -n <namespace> get serviceentry
If the result has more than one entry (multiple lines in one ServiceEntry object), check if a destination address (e.g. foo.com) is available in various lines. If the same destination address (e.g. foo.com) is available in various lines, ensure that the column "DNS" does not have different resolution settings (e.g. one line uses DNS, the other line has NONE). If yes, this is an indicator that you try to apply different DNS settings to the same destination address.
A solution is:
a) to unify the DNS setting, setting all lines to NONE or DNS, but not to mix it up.
b) Ensure the destination (foo.com) is available in one line, and a collision of different DNS rules does not appear.
a) involves restarting istio-ingressgateway pods (data plane) to make it work.
b) Involves no restart of istio data or istio control plane.
Basically: It helps to check the status between Control Plane (istiod) and DatapPlane (istio-ingressgateway) with
istioctl proxy-status
The output of istioctl proxy-status should ensure that the columns say "SYNC" this ensures that the control plane and Data Plane are synced. If not, you can restart the istio-ingressgateway deployment or the istiod daemonset, to force "fresh" processes.
Further, it helped to run
istioctl analyze -A
to ensure that targets are checked in the VirtualService context and do exist. If a virtual service definition exists with routing definitions whose destination is unavailable, istioctl analyze -A can detect these unavailable destinations.
Furthermore, reading the logfiles of the istiod container helps. The istiod error messages often indicate the context of the error in the routing (which namespace and service or istio setting). You can use the default way with
kubectl -n istio-system logs <nameOfIstioDPod>
Referenes:
Upvotes: 2
Reputation: 2438
Just in case, like me, you get curious... Even though in my scenario it was clear the case of the error...
Error cause: I had two versions of the same service (v1 and v2), and an Istio VirtualService configured with traffic route destination using weights. Then, 95% goes to v1 and 5% goes to v2. As I didn't have the v1 deployed (yet), of course, the error "503 - no healthy upstream" shows up 95% of the requests.
Ok, even so, I knew the problem and how to fix it (just deploy v1), I was wondering... But, how can I have more information about this error? How could I get a deeper analysis of this error to find out what was happening?
This is a way of investigating using the configuration command line utility of Istio, the istioctl:
# 1) Check the proxies status -->
$ istioctl proxy-status
# Result -->
NAME CDS LDS EDS RDS PILOT VERSION
...
teachstore-course-v1-74f965bd84-8lmnf.development SYNCED SYNCED SYNCED SYNCED istiod-86798869b8-bqw7c 1.5.0
...
...
# 2) Get the name outbound from JSON result using the proxy (service with the problem) -->
$ istioctl proxy-config cluster teachstore-course-v1-74f965bd84-8lmnf.development --fqdn teachstore-student.development.svc.cluster.local -o json
# 2) If you have jq install locally (only what we need, already extracted) -->
$ istioctl proxy-config cluster teachstore-course-v1-74f965bd84-8lmnf.development --fqdn teachstore-course.development.svc.cluster.local -o json | jq -r .[].name
# Result -->
outbound|80||teachstore-course.development.svc.cluster.local
inbound|80|9180-tcp|teachstore-course.development.svc.cluster.local
outbound|80|v1|teachstore-course.development.svc.cluster.local
outbound|80|v2|teachstore-course.development.svc.cluster.local
# 3) Check the endpoints of "outbound|80|v2|teachstore-course..." using v1 proxy -->
$ istioctl proxy-config endpoints teachstore-course-v1-74f965bd84-8lmnf.development --cluster "outbound|80|v2|teachstore-course.development.svc.cluster.local"
# Result (the v2, 5% of the traffic route is ok, there are healthy targets) -->
ENDPOINT STATUS OUTLIER CHECK CLUSTER
172.17.0.28:9180 HEALTHY OK outbound|80|v2|teachstore-course.development.svc.cluster.local
172.17.0.29:9180 HEALTHY OK outbound|80|v2|teachstore-course.development.svc.cluster.local
# 4) However, for the v1 version "outbound|80|v1|teachstore-course..." -->
$ istioctl proxy-config endpoints teachstore-course-v1-74f965bd84-8lmnf.development --cluster "outbound|80|v1|teachstore-course.development.svc.cluster.local"
ENDPOINT STATUS OUTLIER CHECK CLUSTER
# Nothing! Emtpy, no Pods, that's explain the "no healthy upstream" 95% of time.
Upvotes: 6
Reputation: 31
delete destinationrules.networking.istio.io and recreate the virtualservice.networking.istio.io
[root@10-20-10-110 ~]# curl http://dprovider.example.com:31400/dw/provider/beat
no healthy upstream[root@10-20-10-110 ~]#
[root@10-20-10-110 ~]# curl http://10.210.11.221:10100/dw/provider/beat
"该服务节点 10.210.11.221 心跳正常!"[root@10-20-10-110 ~]#
[root@10-20-10-110 ~]#
[root@10-20-10-110 ~]# cat /home/example_service_yaml/vs/dw-provider-service.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: dw-provider-service
namespace: example
spec:
hosts:
- "dprovider.example.com"
gateways:
- example-gateway
http:
- route:
- destination:
host: dw-provider-service
port:
number: 10100
subset: "v1-0-0"
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: dw-provider-service
namespace: example
spec:
host: dw-provider-service
subsets:
- name: "v1-0-0"
labels:
version: 1.0.0
[root@10-20-10-110 ~]# vi /home/example_service_yaml/vs/dw-provider-service.yaml
[root@10-20-10-110 ~]# kubectl -n example get vs -o wide | grep dw
dw-collection-service [example-gateway] [dw.collection.example.com] 72d
dw-platform-service [example-gateway] [dplatform.example.com] 81d
dw-provider-service [example-gateway] [dprovider.example.com] 21m
dw-sync-service [example-gateway] [dw-sync-service dsync.example.com] 34d
[root@10-20-10-110 ~]# kubectl -n example delete vs dw-provider-service
virtualservice.networking.istio.io "dw-provider-service" deleted
[root@10-20-10-110 ~]# kubectl -n example delete d dw-provider-service
daemonsets.apps deniers.config.istio.io deployments.extensions dogstatsds.config.istio.io
daemonsets.extensions deployments.apps destinationrules.networking.istio.io
[root@10-20-10-110 ~]# kubectl -n example delete destinationrules.networking.istio.io dw-provider-service
destinationrule.networking.istio.io "dw-provider-service" deleted
[root@10-20-10-110 ~]# kubectl apply -f /home/example_service_yaml/vs/dw-provider-service.yaml
virtualservice.networking.istio.io/dw-provider-service created
[root@10-20-10-110 ~]# curl http://dprovider.example.com:31400/dw/provider/beat
"该服务节点 10.210.11.221 心跳正常!"[root@10-20-10-110 ~]#
[root@10-20-10-110 ~]#
Upvotes: 3
Reputation: 2195
I faced the issue, when I my pod was in ContainerCreating
state. So, it resulted in 503 error. Also as @pegaldon, explained it can also occur due to incorrect route configuration or there are no gateways created by user.
Upvotes: 2
Reputation: 603
Although this is a somewhat general error resulting from a routing issue within an improper Istio setup, I will provide a general solution/piece of advice to anyone coming across the same issue.
In my case the issue was due to incorrect route rule configuration, the Kubernetes native services were functioning however the Istio routing rules were incorrectly configured so Istio could not route from the ingress into the service.
Upvotes: 5