Reputation:
I am trying to install PGO operator by following this Docs. When I run this command
kubectl apply --server-side -k kustomize/install/default
my Pod run and soon it hit to crash loop back.
What I have done I check the logs of Pods with this command
kubectl logs pgo-98c6b8888-fz8zj -n postgres-operator
Result
time="2023-01-09T07:50:56Z" level=debug msg="debug flag set to true" version=5.3.0-0
time="2023-01-09T07:51:26Z" level=error msg="Failed to get API Group-Resources" error="Get \"https://10.96.0.1:443/api?timeout=32s\": dial tcp 10.96.0.1:443: i/o timeout" version=5.3.0-0
panic: Get "https://10.96.0.1:443/api?timeout=32s": dial tcp 10.96.0.1:443: i/o timeout
goroutine 1 [running]:
main.assertNoError(...)
github.com/crunchydata/postgres-operator/cmd/postgres-operator/main.go:42
main.main()
github.com/crunchydata/postgres-operator/cmd/postgres-operator/main.go:84 +0x465
To check the network connection to host I run this command
wget https://10.96.0.1:443/api
The Result is
--2023-01-09 09:49:30-- https://10.96.0.1/api
Connecting to 10.96.0.1:443... connected.
ERROR: cannot verify 10.96.0.1's certificate, issued by ‘CN=kubernetes’:
Unable to locally verify the issuer's authority.
To connect to 10.96.0.1 insecurely, use `--no-check-certificate'.
As you can see it is connected to API
Strange issue might be useful to help me
I run kubectl get pods --all-namespaces
and see this output
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-9gmmq 1/1 Running 0 3d16h
kube-flannel kube-flannel-ds-rcq8l 0/1 CrashLoopBackOff 10 (3m15s ago) 34m
kube-flannel kube-flannel-ds-rqwtj 0/1 CrashLoopBackOff 10 (2m53s ago) 34m
kube-system etcd-masterk8s-virtual-machine 1/1 Running 1 (5d ago) 3d17h
kube-system kube-apiserver-masterk8s-virtual-machine 1/1 Running 2 (5d ago) 3d17h
kube-system kube-controller-manager-masterk8s-virtual-machine 1/1 Running 8 (2d ago) 3d17h
kube-system kube-scheduler-masterk8s-virtual-machine 1/1 Running 7 (5d ago) 3d17h
postgres-operator pgo-98c6b8888-fz8zj 0/1 CrashLoopBackOff 7 (4m59s ago) 20m
As you can see my two kube-flannel Pods are also in crash loop-back and one is running. I am not sure if this is the main cause of this problem
What I want? I want to run the PGO pod successfully with no error.
How you can help me? Please help me to find the issue or any other way to get detailed logs. I am not able to find the root cause of this problem because, If it was network issue then why its connected? if its something else then how can I find the information?
Update and New errors after apply the fixes:
time="2023-01-09T11:57:47Z" level=debug msg="debug flag set to true" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Metrics server is starting to listen" addr=":8080" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="upgrade checking enabled" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="starting controller runtime manager and will wait for signal to exit" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting server" addr="[::]:8080" kind=metrics path=/metrics version=5.3.0-0
time="2023-01-09T11:57:47Z" level=debug msg="upgrade check issue: namespace not set" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1beta1.PostgresCluster" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.ConfigMap" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Endpoints" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.PersistentVolumeClaim" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Secret" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Service" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.ServiceAccount" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Deployment" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.StatefulSet" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Job" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Role" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.RoleBinding" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.CronJob" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.PodDisruptionBudget" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Pod" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.StatefulSet" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting Controller" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster version=5.3.0-0
W0109 11:57:48.006419 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:57:48.006642 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
time="2023-01-09T11:57:49Z" level=info msg="{\"pgo_versions\":[{\"tag\":\"v5.1.0\"},{\"tag\":\"v5.0.5\"},{\"tag\":\"v5.0.4\"},{\"tag\":\"v5.0.3\"},{\"tag\":\"v5.0.2\"},{\"tag\":\"v5.0.1\"},{\"tag\":\"v5.0.0\"}]}" X-Crunchy-Client-Metadata="{\"deployment_id\":\"288f4766-8617-479b-837f-2ee59ce2049a\",\"kubernetes_env\":\"v1.26.0\",\"pgo_clusters_total\":0,\"pgo_version\":\"5.3.0-0\",\"is_open_shift\":false}" version=5.3.0-0
W0109 11:57:49.163062 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:57:49.163119 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
W0109 11:57:51.404639 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:57:51.404811 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
W0109 11:57:54.749751 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:57:54.750068 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
W0109 11:58:06.015650 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:58:06.015710 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
W0109 11:58:25.355009 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:58:25.355391 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
W0109 11:59:10.447123 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:59:10.447490 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
time="2023-01-09T11:59:47Z" level=error msg="Could not wait for Cache to sync" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster error="failed to wait for postgrescluster caches to sync: timed out waiting for cache to be synced" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=info msg="Stopping and waiting for non leader election runnables" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=info msg="Stopping and waiting for leader election runnables" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=info msg="Stopping and waiting for caches" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=error msg="failed to get informer from cache" error="Timeout: failed waiting for *v1.PodDisruptionBudget Informer to sync" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=error msg="error received after stop sequence was engaged" error="context canceled" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=info msg="Stopping and waiting for webhooks" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=info msg="Wait completed, proceeding to shutdown the manager" version=5.3.0-0
panic: failed to wait for postgrescluster caches to sync: timed out waiting for cache to be synced
goroutine 1 [running]:
main.assertNoError(...)
github.com/crunchydata/postgres-operator/cmd/postgres-operator/main.go:42
main.main()
github.com/crunchydata/postgres-operator/cmd/postgres-operator/main.go:118 +0x434
Upvotes: 0
Views: 1246
Reputation: 1890
If this is a new deployment, I suggest using v5.
That said, as PGO manages the networking for Postgres clusters (and as such, manages listen_adresses), there's no reason to modify the listen_addresses configuration parameter. If you need to manage networking or networking access, you can do that by setting the pg_hba config or using NetworkPolicies.
Please go through the Custom 'listen_addresses' not applied #2904 for more information.
CrashLoopBackOff: Check the pod logs for configuration or deployment issues such as missing dependencies (Like : kubernetes engine doesn't support docker-compose depends-on, so now we are using kubernetes + docker without nginx) and also check for pods being OOM killed and excessive resource usage.
Check for the timeout issues and also lab on timeout problem
ERROR: cannot verify 10.96.0.1's certificate, issued by ‘CN=kubernetes’:
Unable to locally verify the issuer's authority.
To connect to 10.96.0.1 insecurely, use `--no-check-certificate'.
Try solution for the above Error : first, remove ip link flannel.1 on every hosts which has this problem
secondly, delete kube-flannel-ds from k8s
last, recreate kube-flannel-ds from k8s, flannel.1 ip link will recreated and return back good.
(For flannel to work correctly, you must pass --pod-network-cidr=10.244.0.0/16
to kubeadm init.(I mean Change CIDR).)
Edit :
Please check similar issue and solution ,which may help to resolve your issue.
Upvotes: 0