Kubernetes pods not connecting to database service

Question

I am running a kubernetes 1.25 cluster using Amazon EKS. I deployed the Anchore app using a Helm chart. I modified the container images to pull from my AWS ECR repo instead of docker.

Looking at the logs of one of the pods i see that it is trying to access the database service and cannot resolve it.


(Background on this error at: https://sqlalche.me/e/14/e3q8)
[MainThread] 2023-04-30T00:06:41.155167 [anchore_enterprise_manager.util.db/connect_database()] [INFO] DB attempting to connect...
[MainThread] 2023-04-30T00:06:41.156165 [anchore_enterprise_manager.util.db/connect_database()] [WARN] DB connection failed, retrying - exception: test connection failed - exception: (psycopg2.OperationalError) could not translate host name "postgresql.anchore.svc.cluster.local:5432" to address: Name or service not known

here is my postgresql service ➜ ~ k get service postgres-postgresql NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE postgres-postgresql ClusterIP 172.20.191.83 5432/TCP 27h

➜ ~ k get endpoints postgres-postgresql NAME ENDPOINTS AGE postgres-postgresql 10.1.0.74:5432 27h

Nothing in the postgres pod logs.

I've verifed the AWS security groups are wide open and allow all traffic between cluster and nods. Verified Core DNS was working. Spun up a busy box pod and resovled the above service.

➜  anchore git:(main) ✗ k exec -it busybox-pod -- nslookup postgresql.anchore.svc.cluster.local
Server:     172.20.0.10
Address:    172.20.0.10:53


Name:   postgresql.anchore.svc.cluster.local
Address: 172.20.191.83

Here are the logs from the postgresql pod

 k logs postgres-postgresql-59468ff768-zhn6z   
Defaulted container "postgresql" out of: postgresql, postgres-postgresql

PostgreSQL Database directory appears to contain a database; Skipping initialization

2023-04-30 14:52:22.289 UTC [1] LOG:  starting PostgreSQL 14.6 (Debian 14.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-04-30 14:52:22.289 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-04-30 14:52:22.289 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2023-04-30 14:52:22.292 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-30 14:52:22.296 UTC [27] LOG:  database system was shut down at 2023-04-30 14:52:21 UTC
2023-04-30 14:52:22.300 UTC [1] LOG:  database system is ready to accept connections

Ive verified that the svc selectors match the pod labels.

➜  anchore git:(main) ✗ k describe svc  postgresql
Name:              postgresql
Namespace:         anchore
Labels:            app=postgresql
                   app.kubernetes.io/managed-by=Helm
                   chart=postgresql-1.0.1
                   heritage=Helm
                   release=postgres
Annotations:       meta.helm.sh/release-name: postgres
                   meta.helm.sh/release-namespace: anchore
Selector:          app=postgresql,release=postgres
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                172.20.191.83
IPs:               172.20.191.83
Port:              postgresql  5432/TCP
TargetPort:        postgresql/TCP
Endpoints:         
Session Affinity:  None
Events:

 k describe pods postgres-postgresql-59468ff768-zhn6z 
Name:             postgres-postgresql-59468ff768-zhn6z
Namespace:        anchore
Priority:         0
Service Account:  default
Node:             ip-10-1-0-223.us-gov-east-1.compute.internal/10.1.0.223
Start Time:       Sun, 30 Apr 2023 09:52:21 -0500
Labels:           app=postgresql
                  pod-template-hash=59468ff768
                  release=postgres
Annotations:      
Status:           Running
IP:               10.1.0.95
IPs:
  IP:           10.1.0.95
Controlled By:  ReplicaSet/postgres-postgresql-59468ff768
Containers:
  postgresql:
    Container ID:   containerd://4a76d4582bc4e443cd9dc93e578576f13de0194cc36ec1acff62e5e45dd0e070
    Image:          247301905713.dkr.ecr.us-gov-east-1.amazonaws.com/postgres:14
    Image ID:       247301905713.dkr.ecr.us-gov-east-1.amazonaws.com/postgres@sha256:db02f92063fb6083cb9dbf9d967ae0563d17d1e6332b6dfba6bdd7266c420ffa
    Port:           5432/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 30 Apr 2023 09:52:22 -0500
    Ready:          True
    Restart Count:  0

I would also like to add that I'm seeing in a few pods the readiness/live probe failing.

I've verified no network policies are being used. No IP tables. No security groups are blocking traffic.

Type Reason Age From Message

Warning BackOff 17m (x5347 over 43h) kubelet Back-off restarting failed container

 Warning  Unhealthy  7m26s (x13887 over 43h)  kubelet  Readiness probe failed:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to localhost port 8089: Connection refused
  Warning  Unhealthy  2m30s (x14341 over 43h)  kubelet  Readiness probe failed: Get "http://10.1.1.67:8668/health": dial tcp 10.1.1.67:8668: connect: connection refused

If anyone can point me in the right direction it would be greatly appreciated. I have only been studying k8s for about 2 months now so i may be making an obvious blunder here.. Let me know if any other output would help here.

I've tried

Verify NSLOOKUP works to svc ip
relaunch deployments, pods and svcs
verified AWS Security groups and addons
checked logs and events
deleting pods.

Kubernetes pods not connecting to database service

Answers (1)

Related Questions