Reputation: 1
I have installed https://github.com/zalando/postgres-operator on my K8S cluster and tried to deploy the PostgreSQL sample "minimal-postgres-manifest.yaml" provided in this git project.
After deployment of the 2 spilo pods, I can't have access to the databases (connection timeout with psql).
When I check the logs of the postgre-operator pod, I can see that this pod fails to connect to the database and can't initialize database.
time="2022-04-15T06:18:46Z" level=debug msg="syncing master service" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:18:46Z" level=debug msg="syncing replica service" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:18:46Z" level=debug msg="No load balancer created for the replica service" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1 time="2022-04 15T06:18:46Z" level=debug msg="syncing volumes using \"pvc\" storage resize mode" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:18:46Z" level=info msg="volume claims do not require changes" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1 time="2022-04-15T06:18:46Z" level=debug msg="syncing statefulsets" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1 time="2022-04-15T06:18:47Z" level=debug msg="making GET http request: http://10.244.2.13:8008/config" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:18:59Z" level=debug msg="making GET http request: http://10.244.1.14:8008/patroni" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:18:59Z" level=debug msg="making GET http request: http://10.244.2.13:8008/patroni" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:18:59Z" level=debug msg="syncing pod disruption budgets" cluster name=default/acid-minimal-cluster pkg=cluster worker=1 W0415 06:18:59.347588 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget time="2022-04-15T06:18:59Z" level=debug msg="syncing roles" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:19:14Z" level=warning msg="could not connect to Postgres database: dial tcp: i/o timeout" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:19:29Z" level=warning msg="could not connect to Postgres database: dial tcp: i/o timeout" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:19:44Z" level=warning msg="could not connect to Postgres database: dial tcp: i/o timeout" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:19:59Z" level=warning msg="could not connect to Postgres database: dial tcp: i/o timeout" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:20:14Z" level=warning msg="could not connect to Postgres database: dial tcp: i/o timeout" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:20:29Z" level=warning msg="could not connect to Postgres database: dial tcp: i/o timeout" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:20:44Z" level=warning msg="could not connect to Postgres database: dial tcp: i/o timeout" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:20:59Z" level=warning msg="could not connect to Postgres database: dial tcp: i/o timeout" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:20:59Z" level=warning msg="error while syncing cluster state: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=default/acid-minimal-cluster pkg=cluster worker=1
time="2022-04-15T06:20:59Z" level=error msg="could not sync cluster: could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=default/acid-minimal-cluster pkg=controller worker=1
When I'm connected into one of the spilo pod with "kubectl exec", I can check my role and database defined in the "minimal-postgres-manifest.yaml" file are not created.
I have deployed Zalando postgres-operator and the postgresql cluster with the QuickStart procedure : https://github.com/zalando/postgres-operator/blob/master/docs/quickstart.md
I have made just 3 changes in the "minimal-postgres-manifest.yaml" file provided : I change the number of replica for 3 to 2, I decrease the database size and I declare a specific storageclass
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: acid-minimal-cluster
namespace: default
spec:
teamId: "acid"
volume:
size: 500Mi
storageClass: pg-openebs-sc
numberOfInstances: 2
users:
zalando: # database owner
- superuser
- createdb
foo_user: [] # role for application foo
databases:
foo: zalando # dbname: owner
preparedDatabases:
bar: {}
postgresql:
version: "14"
My storageclass is based on OpenEBS but I also tried to use "kubernetes.io/no-provisioner" with the same result. If I check the folders used by the pv, some folders and files are created by th postgresql pods.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: pg-openebs-sc
annotations:
openebs.io/cas-type: local
cas.openebs.io/config: |
- name: StorageType
value: hostpath
- name: BasePath
value: /var/lib/postgresql/data
provisioner: openebs.io/local
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
My spilo pods are running, one have master role and the other one has replica role
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES SPILO-ROLE
acid-minimal-cluster-0 1/1 Running 0 15h 10.244.2.13 ctms-prod-vm-prod01-1a-02.prod.outscale.easyconform <none> <none> master
acid-minimal-cluster-1 1/1 Running 0 15h 10.244.1.14 ctms-prod-vm-prod01-1a-01.prod.outscale.easyconform <none> <none> replica
The end of the master pod logs contains :
2022-04-15 07:05:51,306 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
2022-04-15 07:06:01,306 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
2022-04-15 07:06:11,307 INFO: no action. I am (acid-minimal-cluster-0) the leader with the lock
2
The end of the replica pod logs contains :
2022-04-15 07:07:11,341 INFO: no action. I am a secondary (acid-minimal-cluster-1) and following a leader (acid-minimal-cluster-0)
2022-04-15 07:07:21,338 INFO: no action. I am a secondary (acid-minimal-cluster-1) and following a leader (acid-minimal-cluster-0)
2022-04-15 07:07:31,320 INFO: no action. I am a secondary (acid-minimal-cluster-1) and following a leader (acid-minimal-cluster-0)
Do you have any ideas to resolve these points?
Upvotes: 0
Views: 4345
Reputation: 1
If the version of Operator being reported above is 1.8.2 there is already an open issue with Kubernetes 1.24 reported on the Zalando Github
Upvotes: 0