pheeper
pheeper

Reputation: 1527

Docker postgres invalid primary checkpoint record

I've been trying to get postgres running in Docker (on Windows) with a persistent data storage and thought I finally had it, but now when I try to start up my container I get the following errors:

LOG:  invalid primary checkpoint record
LOG:  invalid secondary checkpoint record
PANIC:  could not locate a valid checkpoint record

Here's I got my container up and running to start with:

docker volume create ct_data
docker run --name postgres_ct -v ct_data:/var/lib/postgresql/data -p 5432:5432 postgres

I then connected to it using pgAdmin, created a new table, and imported a postgres dump file using the restore option. The data loaded without issue. I started a second container using the same volume as above, but with a different name and port number to test it and I was able to connect, so I shut it down.

Now, an hour later, I found my original container was stopped automatically so I tried to start it back up and got the errors above. This is my first attempt at using Docker, so it's very possible I'm missing something simple here.

Upvotes: 7

Views: 13592

Answers (2)

palc
palc

Reputation: 407

The accepted answer in this related post asks to remove the docker volume which means that all the data in the PostgreSQL instance will be lost!!!

I recently encountered the same issue in an environment where we had months of stored data. So, removing/deleting the volume was a (very very) last resort.

I came across this answer which asks to reset the transaction write-ahead log (WAL). This works in a scenario where the DB is installed as an application and not run as a container. In case of docker, however, the container itself fails to deploy as a result of the following error

PANIC: could not locate a valid checkpoint record

Solution

The solution in this case is to use docker run command with the postgres image and mount the same physical volume on it at the correct PGDATA path (that you had configured earlier) and directly login to the bash shell.

docker run -it -v /data/postgres_data:<PGDATA_path> postgres:14 /bin/bash

The <PGDATA_path> is /var/lib/postgresql/data by default if you haven't altered it. If you were using docker volumes, you need to figure out your docker volume name from the list of volumes: docker volume ls. Obtain the volume name and then use docker run like below.

docker run -it -v <docker_volume_name>:<PGDATA_path> postgres:14 /bin/bash

Once you have the container's shell, you can now reset the WAL using

# Postgres >= 10
pg_resetwal <PGDATA_path>

# Postgres < 10
pg_resetxlog <PGDATA_path>

Use the -f option if required. Refer to PostgreSQL Documentation on pg_resetwal for further details on the same.

Now, exit the container's shell and redeploy the container using docker-compose/docker stack, whichever you're using, and you'll have successfully recovered your DB while retaining all the data.

Notes

  1. Some data may still be lost, especially at the places where there was no consensus on WAL.
  2. Some db-dependent applications may throw further errors like "The database log was reset, data inserted was not found". In such cases, you will have to reset the write-heads of those applications.
  3. Visit my answer here to get an understanding of when/why this happens.

Upvotes: 11

pheeper
pheeper

Reputation: 1527

Thanks to the help of @Matt I figured out what happened. The volume (aka postgres database) became corrupted as soon as I connected two containers to it at the same time.

It is possible though to allow multiple containers to connect to a single database container, but not by sharing volumes. You would need to create a new container that would run the postgres daemon and then other containers can connect to it via tcp sockets. For more information on how to do this read the docs

Upvotes: 3

Related Questions