Reputation: 174
I am running a kubernetes 1.25 cluster using Amazon EKS. I deployed the Anchore app using a Helm chart. I modified the container images to pull from my AWS ECR repo instead of docker.
Looking at the logs of one of the pods i see that it is trying to access the database service and cannot resolve it.
(Background on this error at: https://sqlalche.me/e/14/e3q8)
[MainThread] 2023-04-30T00:06:41.155167 [anchore_enterprise_manager.util.db/connect_database()] [INFO] DB attempting to connect...
[MainThread] 2023-04-30T00:06:41.156165 [anchore_enterprise_manager.util.db/connect_database()] [WARN] DB connection failed, retrying - exception: test connection failed - exception: (psycopg2.OperationalError) could not translate host name "postgresql.anchore.svc.cluster.local:5432" to address: Name or service not known
here is my postgresql service ➜ ~ k get service postgres-postgresql NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE postgres-postgresql ClusterIP 172.20.191.83 5432/TCP 27h
➜ ~ k get endpoints postgres-postgresql NAME ENDPOINTS AGE postgres-postgresql 10.1.0.74:5432 27h
Nothing in the postgres pod logs.
I've verifed the AWS security groups are wide open and allow all traffic between cluster and nods. Verified Core DNS was working. Spun up a busy box pod and resovled the above service.
➜ anchore git:(main) ✗ k exec -it busybox-pod -- nslookup postgresql.anchore.svc.cluster.local
Server: 172.20.0.10
Address: 172.20.0.10:53
Name: postgresql.anchore.svc.cluster.local
Address: 172.20.191.83
Here are the logs from the postgresql pod
k logs postgres-postgresql-59468ff768-zhn6z
Defaulted container "postgresql" out of: postgresql, postgres-postgresql
PostgreSQL Database directory appears to contain a database; Skipping initialization
2023-04-30 14:52:22.289 UTC [1] LOG: starting PostgreSQL 14.6 (Debian 14.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-04-30 14:52:22.289 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2023-04-30 14:52:22.289 UTC [1] LOG: listening on IPv6 address "::", port 5432
2023-04-30 14:52:22.292 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-30 14:52:22.296 UTC [27] LOG: database system was shut down at 2023-04-30 14:52:21 UTC
2023-04-30 14:52:22.300 UTC [1] LOG: database system is ready to accept connections
Ive verified that the svc selectors match the pod labels.
➜ anchore git:(main) ✗ k describe svc postgresql
Name: postgresql
Namespace: anchore
Labels: app=postgresql
app.kubernetes.io/managed-by=Helm
chart=postgresql-1.0.1
heritage=Helm
release=postgres
Annotations: meta.helm.sh/release-name: postgres
meta.helm.sh/release-namespace: anchore
Selector: app=postgresql,release=postgres
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.20.191.83
IPs: 172.20.191.83
Port: postgresql 5432/TCP
TargetPort: postgresql/TCP
Endpoints:
Session Affinity: None
Events: <none>
k describe pods postgres-postgresql-59468ff768-zhn6z
Name: postgres-postgresql-59468ff768-zhn6z
Namespace: anchore
Priority: 0
Service Account: default
Node: ip-10-1-0-223.us-gov-east-1.compute.internal/10.1.0.223
Start Time: Sun, 30 Apr 2023 09:52:21 -0500
Labels: app=postgresql
pod-template-hash=59468ff768
release=postgres
Annotations: <none>
Status: Running
IP: 10.1.0.95
IPs:
IP: 10.1.0.95
Controlled By: ReplicaSet/postgres-postgresql-59468ff768
Containers:
postgresql:
Container ID: containerd://4a76d4582bc4e443cd9dc93e578576f13de0194cc36ec1acff62e5e45dd0e070
Image: 247301905713.dkr.ecr.us-gov-east-1.amazonaws.com/postgres:14
Image ID: 247301905713.dkr.ecr.us-gov-east-1.amazonaws.com/postgres@sha256:db02f92063fb6083cb9dbf9d967ae0563d17d1e6332b6dfba6bdd7266c420ffa
Port: 5432/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 30 Apr 2023 09:52:22 -0500
Ready: True
Restart Count: 0
I would also like to add that I'm seeing in a few pods the readiness/live probe failing.
I've verified no network policies are being used. No IP tables. No security groups are blocking traffic.
Type Reason Age From Message
Warning BackOff 17m (x5347 over 43h) kubelet Back-off restarting failed container
Warning Unhealthy 7m26s (x13887 over 43h) kubelet Readiness probe failed: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to localhost port 8089: Connection refused
Warning Unhealthy 2m30s (x14341 over 43h) kubelet Readiness probe failed: Get "http://10.1.1.67:8668/health": dial tcp 10.1.1.67:8668: connect: connection refused
If anyone can point me in the right direction it would be greatly appreciated. I have only been studying k8s for about 2 months now so i may be making an obvious blunder here.. Let me know if any other output would help here.
I've tried
Upvotes: 0
Views: 715
Reputation: 34426
This error:
could not translate host name "postgresql.anchore.svc.cluster.local:5432" to address: Name or service not known
looks to me like the :5432
is being included in the host name. You haven't shared the app configuration or how this hostname is passed in, but make sure that the hostname does not include the port.
Upvotes: 1