Reputation: 496
Trying to wrap my head around Istio and the service mesh. I had a working cluster set-up with an nginx ingress and using cert-manager for TLS. I swtiched over to Istio and a gateway/ virtual service set up, and as far as I can tell, everything is connected, but when I try to access the site it comes back with a blanks screen (404 response on the network tab) and when I curl I see a 404. This is the same with trying direct or specifying the 443 port. Not sure how to debug, Istio's docs only mention the 404 with multiple gateways with same TLS cert, but I only have the 1 gateway at this time. Also, the gateway and virtual service are in the same namespace, and in the virtual service, the route for the backend - /api is set before the frontend - /
Heres the only error response I get which is from curl with options, doing a plain curl returns nothing at all, not even a 403. On GKE console, all workloads are good, no errors in logs.
curl -X OPTIONS https://app.example.net -I
HTTP/2 404
date: Wed, 29 Nov 2023 20:18:13 GMT
server: istio-envoy
The logs show connection to upstream:
2023-11-19T20:48:48.798743Z info Readiness succeeded in 1.15333632s
2023-11-19T20:48:48.799470Z info Envoy proxy is ready
2023-11-19T21:17:44.948873Z info xdsproxy connected to upstream XDS server: istiod.istio-system.svc:15012
2023-11-19T21:47:40.301270Z info xdsproxy connected to upstream XDS server: istiod.istio-system.svc:15012
2023-11-19T22:18:07.530190Z info xdsproxy connected to upstream XDS server: istiod.istio-system.svc:15012
...
2023-11-20T08:48:48.028231Z info ads XDS: Incremental Pushing ConnectedEndpoints:2 Version:
2023-11-20T08:48:48.250424Z info cache generated new workload certificate latency=221.620042ms ttl=23h59m59.749615036s
2023-11-20T09:17:09.369171Z info xdsproxy connected to upstream XDS server: istiod.istio-system.svc:15012
2023-11-20T09:46:07.080923Z info xdsproxy connected to upstream XDS server: istiod.istio-system.svc:15012
...
Mesh shows connected sidecars for the gateway, frontend, backend:
$ istioctl proxy-status
NAME CLUSTER CDS LDS EDS RDS ECDS ISTIOD VERSION
backend-deploy-67486897bb-fjv5g.demoapp Kubernetes SYNCED SYNCED SYNCED SYNCED NOT SENT istiod-64c94c5d78-5879x 1.19.3
demoapp-gtw-istio-674b96dcdb-mfsfg.demoapp Kubernetes SYNCED SYNCED SYNCED SYNCED NOT SENT istiod-64c94c5d78-5879x 1.19.3
frontend-deploy-6f6b4984b5-lnq4p.demoapp Kubernetes SYNCED SYNCED SYNCED SYNCED NOT SENT istiod-64c94c5d78-5879x 1.19.3
Gateway
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: demoapp-gtw
namespace: demoapp
annotations:
cert-manager.io/issuer: "letsencrypt-prod"
spec:
selector:
istio: ingressgateway
servers:
- port:
name: http
number: 80
protocol: HTTP
hosts: [app.example.net]
tls:
httpsRedirect: true
- port:
name: https
number: 443
protocol: HTTPS
hosts: [app.example.net]
tls:
mode: SIMPLE
credentialName: demoapp-tls
Virtual Service
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: vert-serv-from-gw
spec:
hosts: [ app.example.net ]
gateways:
- "demoapp/demoapp-gtw"
- mesh
http:
- match:
- uri:
prefix: /api
route:
- destination:
host: backend-svc
port:
number: 5000
corsPolicy:
allowOrigins:
- exact: https://app.octodemo.net
allowMethods:
- PUT
- GET
- POST
- PATCH
- OPTIONS
- DELETE
allowHeaders:
- DNT
- X-CustomHeader
- X-LANG
- Keep-Alive
- User-Agent
- X-Requested-With
- If-Modified-Since
- Cache-Control
- Content-Type
- X-Api-Key
- X-Device-Id
- Access-Control-Allow-Origin
- match:
- uri:
prefix: /
route:
- destination:
host: frontend-svc
port:
number: 3000
Not sure how to try and debug this further with no clear errors, if anyone has any suggestions, I'm all ears. Thanks
EDIT So I think I've dialed in a bit of what is happening. Running proxy-config on the routes for the gateway shows:
$ istioctl pc routes demoapp-gtw-istio-674b96dcdb-mfsfg.demoapp
NAME VHOST NAME DOMAINS MATCH VIRTUAL SERVICE
http.80 blackhole:80 * /* 404
https.443.default.demoapp-gtw-istio-autogenerated-k8s-gateway-https.demoapp blackhole:443 * /* 404
backend * /stats/prometheus*
backend * /healthz/ready*
From istio my understanding of the blackhole or passthrough clusters is that the blackhole is to prevent unauthorized ingress and egress traffic to the mesh services, but that the default is for passthrough or ALLOW_ANY. Below on the configmap for istio, I'm not seeing either desgined, took the cue from here
$ kubectl get configmap istio -n istio-system -o yaml
apiVersion: v1
data:
mesh: |-
defaultConfig:
discoveryAddress: istiod.istio-system.svc:15012
proxyMetadata: {}
tracing:
zipkin:
address: zipkin.istio-system:9411
defaultProviders:
metrics:
- prometheus
enablePrometheusMerge: true
rootNamespace: istio-system
trustDomain: cluster.local
meshNetworks: 'networks: {}'
kind: ConfigMap
metadata:
creationTimestamp: "2023-10-26T17:45:35Z"
labels:
install.operator.istio.io/owning-resource: installed-state
install.operator.istio.io/owning-resource-namespace: istio-system
istio.io/rev: default
operator.istio.io/component: Pilot
operator.istio.io/managed: Reconcile
operator.istio.io/version: 1.19.3
release: istio
name: istio
namespace: istio-system
resourceVersion: "69895477"
uid: 3c542bc5-5f9f-4486-a37c-2c04fadba0ed
Maybe thats cause I'm not updated enough in my version?
$ istioctl version
client version: 1.20.0
control plane version: 1.19.3
data plane version: 1.19.3 (3 proxies)
Regardless, my routes from the gateway to services should not be getting blackholed as they are declared in the virtual service... right?
Upvotes: 0
Views: 1896
Reputation: 496
Well I dont have a solution, but I'm fairly certain I've found the problem.
The istio routes are pointing to services belonging to another namespace:
$ istioctl pc routes backend-deploy-7f584f9fd7-mn5z4.demoapp
NAME VHOST NAME DOMAINS MATCH VIRTUAL SERVICE
test-frontend-svc.demotest.svc.cluster.local:3000 test-frontend-svc.demotest.svc.cluster.local:3000 * /*
9090 kiali.istio-system.svc.cluster.local:9090 kiali.istio-system, 10.92.12.180 /*
backend * /healthz/ready*
inbound|80|| inbound|http|80 * /*
inbound|80|| inbound|http|80 * /*
test-backend-svcs.demotest.svc.cluster.local:5000 test-backend-svcs.demotest.svc.cluster.local:5000 * /*
Based on a github answer for another users question (from 2019) " My understanding was that this is a known limitation with existing workaround: using distinct names for the ports solves the issue.", I even changed the port names to make them unique per namespace and shifted the port number by 1, but it is still pointing to the wrong services on the old port names.
Here's the updated virtual service after those changes:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: vert-serv-from-gw
spec:
hosts: [ app.octodemo.net ]
gateways:
- "demoapp/demoapp-gtw"
- mesh
http:
- match:
- uri:
prefix: /api
route:
- destination:
host: backend-svc
port:
number: 5001
corsPolicy:
allowOrigins:
- exact: https://app.octodemo.net
allowMethods:
- PUT
- GET
- POST
- PATCH
- OPTIONS
- DELETE
allowHeaders:
- DNT
- X-CustomHeader
- X-LANG
- Keep-Alive
- User-Agent
- X-Requested-With
- If-Modified-Since
- Cache-Control
- Content-Type
- X-Api-Key
- X-Device-Id
- Access-Control-Allow-Origin
- match:
- uri:
prefix: /
route:
- destination:
host: frontend-svc
port:
number: 3001
That did not work, as shown above, istio continues to point to the wrong namespace service (test-backend-svcs and test-frontend-svc). So while digging in their docs, they state this about routes:
Note for Kubernetes users: When short names are used (e.g. “reviews” instead of “reviews.default.svc.cluster.local”), Istio will interpret the short name based on the namespace of the rule, not the service. A rule in the “default” namespace containing a host “reviews will be interpreted as “reviews.default.svc.cluster.local”, irrespective of the actual namespace associated with the reviews service. To avoid potential misconfigurations, it is recommended to always use fully qualified domain names over short names.
So I tried this, using the long name provided through the service registry (backend-svc.demoapp.svc.cluster.local and frontend-svc.demoapp.svc.cluster.local) through this post's approach, and still I'm getting the same result, only showing services for the other namespace which has not been configured.
There is not even a gateway or virtual service in the other namespace, the only step I had taken there was to enable the autoinjection for the sidecars. So how and why this is happening, despite the changes to more specifically (not that they should have needed to have been) point to the correct services, it is still pointing to the services in another namespace on incorrect ports. I'm at a loss of what to do, other than dump the cluster and start fresh. If anyone has any idea as to how this came about or if they have a similar issue, please let me know, as this does nothing to resolve the issue or point to something to avoid going forward.
Upvotes: 0