Backup a postgres Container with its databases

Question

So we have around 100 tests, each test connect to a postgres instance and consume a database loaded with some data. The tests edits and change that data so we reload the postgres database for each test.

This takes really long time so I thought of using Docker for this as follows. I'm new to docker so this is the steps I'm using:

1) I would create one postgres container, load it with the test database that I want and make it ready and polished.

2) Use this command to save my container as tar

 docker save -o postgres_testdatabase.tar postgres_testdatabase

3) For each test I load a new tar into an image

  docker load -i postgres_testdatabase.tar

4) Run the container with the postgres instance

docker run -i -p 5432 postgres_testdatabase

5) The test runs and changes the data..

6) Destroy the container and load a fresh container with new fresh test database

7) Run the second test and so on..

My problem is that I found out that when I backup a container to a tar and load it and then run a new container I do not get my database, I basically get a fresh postgres installation with none of my databases.

What I'm doing wrong?

EDIT:

I tried one of the suggestion to commit my changes before I save my container to an image as follows:

I committed my updated container to a new image. Saved that Image to a tar file, deleted my existing container. Loaded the tar file and then run a new container from my saved image. I still don't see my databases.. I believe it has to do something with Volumes. How do I do this without volumes? how do I force all my data to be in the container so it get backed up with the image?

EDIT2 Warmoverflow suggested I use an sql file to load all my data while loading the image. This wont work in my case since the data is carefully being authored using another software (ArcGIS), plus the data has some complex blob fields geometries, so sql file to load the script wont work. He also suggested that I dont need to save the data as tar if im spawing containers in the same machine. Once Im satisified with my data and commit it to the image, i can load the image into a new container. Thanks for clarifying this. Still the problem is that how do I keep my database within my image so when I restore the image, the database comes with the container.

EDIT3

So I find a workaround inspired by warmoverflow suggestion, this should solve my problem. However, I'm still looking for a cleaner way to do this.

The solution is do the following:

Create a fresh postgres Container.
Populate your database as you please, in my case I use ArcGIS to do so
use pg_dumpall to dump the entire postgres instance into a single file with this command. We can run this command from any postgres client, and we don't have to copy the dump file inside the container. I'm running this from Windows.

C:\Program Files\PostgreSQL\9.3\bin>pg_dumpall.exe -h 192.168.99.100 -p 5432 -U postgres > c:\Hussein\dump\pg_test_dump.dmp
You can now safely delete your container.
Create a new postgres container
Call this command on your container postgres instance to load your dump

C:\Program Files\PostgreSQL\9.3\bin>psql -f c:\Hussein\dump\ pg_test_dump.dmp -h 192.168.99.100 -p 5432 -U postgres
Run the test, test will screw the data so we need to reload, we simply repeat the steps above.

I would still, really want the container image to have the database "in it" so when I run a container from an image, I get the database. Will be great if anyone could suggest a solution with that, will save me huge time.

Edit4 Finally Warmoverflow solved it! Answer below

Thanks

Xiongbing Jin · Accepted Answer

docker save is for images (saving images as tar file). What you need is docker commit which commit container changes to an image, and then save it to tar. But if your database is the same for all tests, you should build a custom image using a Dockerfile, and then run your containers using the single image.

If your data is loaded using an sql file, you can follow the instructions on "How to extend this image" section of the official postgres docker page https://hub.docker.com/_/postgres/. You can create a Dockerfile with the following content

FROM postgres
RUN mkdir -p /docker-entrypoint-initdb.d
ADD data.sql /docker-entrypoint-initdb.d/

Put your data.sql file and Dockerfile in a new folder, and run docker build -t custom_postgres ., which will build a customized image for you, and every time you run a new container with it, it will load the sql file on boot.

[Update]

Based on the new information from the question, the cause of the issue is that the official postgres image defines a VOLUME at the postgres data folder /var/lib/postgresql/data. VOLUME is used to persist data outside the container (when you use docker run -v to mount a host folder to the container), and thus any data inside the VOLUME are not saved when you commit the container itself. While this is normally a good idea, in this specific situation, we actually need data not be persistent, so that a fresh new container with the same data unmodified can be started every time.

The solution is to create your own version of the postgres image, with the VOLUME removed.

Backup a postgres Container with its databases

Answers (1)

Related Questions