Reputation: 3548
I've update my Airbyte image from 0.35.2-alpha
to 0.35.37-alpha
.
[running in kubernetes]
When the system rolled out the db pod wouldn't terminate and I [a terrible mistake] deleted the pod. When it came back up, I get an error -
PostgreSQL Database directory appears to contain a database; Skipping initialization
2022-02-24 20:19:44.065 UTC [1] LOG: starting PostgreSQL 13.6 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
2022-02-24 20:19:44.065 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-02-24 20:19:44.065 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-02-24 20:19:44.071 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-02-24 20:19:44.079 UTC [21] LOG: database system was shut down at 2022-02-24 20:12:55 UTC
2022-02-24 20:19:44.079 UTC [21] LOG: invalid resource manager ID in primary checkpoint record
2022-02-24 20:19:44.079 UTC [21] PANIC: could not locate a valid checkpoint record
2022-02-24 20:19:44.530 UTC [1] LOG: startup process (PID 21) was terminated by signal 6: Aborted
2022-02-24 20:19:44.530 UTC [1] LOG: aborting startup due to startup process failure
2022-02-24 20:19:44.566 UTC [1] LOG: database system is shut down
Pretty sure the WAL file is corrupted, but I'm not sure how to fix this.
Upvotes: 4
Views: 9327
Reputation: 1
Another thing to consider is checking the PostgreSQL configuration for potential misalignments, like incorrect wal_level
or checkpoint_timeout
settings. Misconfigurations here can sometimes cause issues during recovery if checkpoints or WAL segments don’t align properly.
It’s also worth verifying that the storage layer (e.g., file system or RAID) isn’t introducing corruption. Silent disk errors can occasionally lead to problems like this.
Upvotes: 0
Reputation: 1
Unfortunately this morning my system also had the same error.
The error has been resolved successfully and the database is operating stably again. No data loss detected.
Some suggestions to fix this error:
Backup the data folder to a separate area to avoid loss.
Use the trick to shorten postgres's automatic restart:
services:
database:
image: "postgres:13.4-buster"
entrypoint: ["tail", "-f", "/dev/null"]
...
Access the container and run the following commands:
> docker exec -it $(docker ps -q -f "name=<container-name>") bash
> pg_resetwal --dry-run /var/lib/postgresql/data/pgdata
> pg_resetwal /var/lib/postgresql/data/pgdata
Write-ahead log reset
Good luck!
Upvotes: 0
Reputation: 71
The su command is messing with PATH so the easiest solution is to just use gosu to drop from root to postgres gosu postgres pg_resetxlog /var/lib/postgresql/data. Hopefully that works for you!
Upvotes: 0
Reputation: 3548
Warning - there is a potential for data loss
This is a test system, so I wasn't concerned with keeping the latest transactions, and had no backup.
First I overrode the container command to keep the container running but not try to start postgres.
...
spec:
containers:
- name: airbyte-db-container
image: airbyte/db
command: ["sh"]
args: ["-c", "while true; do echo $(date -u) >> /tmp/run.log; sleep 5; done"]
...
And spawned a shell on the pod -
kubectl exec -it -n airbyte airbyte-db-xxxx -- sh
Run pg_reset_wal
# dry-run first
pg_resetwal --dry-run /var/lib/postgresql/data/pgdata
Success!
pg_resetwal /var/lib/postgresql/data/pgdata
Write-ahead log reset
Then removed the temp command in the container, and postgres started up correctly!
Upvotes: 7