Reputation: 21
Environment: kubernetes with istio sidecars injected.
I'm using bitnami/postgresql-ha as a database for my airflow, and randomly seeing the below log in my postgresql statefulset with 3 pods (image: bitnami/postgresql-repmgr:15.3.0-debian-11-r8). Sometimes it appears 10+ times a day, sometimes only once a day, can't find any pattern.
[2023-08-18 02:41:42] [WARNING] unable to ping "user=repmgr password=admin host=airflow-postgresql-1.airflow-postgresql-headless.workflow.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5"
[2023-08-18 02:41:42] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2023-08-18 02:41:42] [WARNING] unable to connect to upstream node "airflow-postgresql-1" (ID: 1001)
[2023-08-18 02:41:42] [NOTICE] node "airflow-postgresql-1" (ID: 1001) has recovered, reconnecting
[2023-08-18 02:41:42] [NOTICE] reconnected to upstream node after 0 seconds
Notice: Always reconnected in 0 seconds.
And this could cause pgpool livenessProbe failed, with this event message, causing airflow tasks failed.
Liveness probe failed: Checking pgpool health...
psql: error: connection to server on socket "/opt/bitnami/pgpool/tmp/.s.PGSQL.5432"
failed: ERROR: unable to read message kind DETAIL: kind does not match between main(0) slot[0] (52)
I've tried:
I've check: the resource (cpu/memory) of all related pods are sufficent
Upvotes: 0
Views: 930
Reputation: 1
I've been seeing the exact same thing. Did some of your steps to solve it as well and eventually found out, that one of my workers was having intermittent network connectivity issues. I got to that, because I was seeing DNS queries failing randomly. That said, it could also be coredns not having enough resources to cope with the clusters demands. You could also check that.
Upvotes: 0