Hussein Nasser
Hussein Nasser

Reputation: 440

Backup a postgres Container with its databases

So we have around 100 tests, each test connect to a postgres instance and consume a database loaded with some data. The tests edits and change that data so we reload the postgres database for each test.

This takes really long time so I thought of using Docker for this as follows. I'm new to docker so this is the steps I'm using:

1) I would create one postgres container, load it with the test database that I want and make it ready and polished.

2) Use this command to save my container as tar

 docker save -o postgres_testdatabase.tar postgres_testdatabase

3) For each test I load a new tar into an image

  docker load -i postgres_testdatabase.tar

4) Run the container with the postgres instance

docker run -i -p 5432 postgres_testdatabase

5) The test runs and changes the data..

6) Destroy the container and load a fresh container with new fresh test database

7) Run the second test and so on..

My problem is that I found out that when I backup a container to a tar and load it and then run a new container I do not get my database, I basically get a fresh postgres installation with none of my databases.

What I'm doing wrong?

EDIT:

I tried one of the suggestion to commit my changes before I save my container to an image as follows:

I committed my updated container to a new image. Saved that Image to a tar file, deleted my existing container. Loaded the tar file and then run a new container from my saved image. I still don't see my databases.. I believe it has to do something with Volumes. How do I do this without volumes? how do I force all my data to be in the container so it get backed up with the image?

EDIT2 Warmoverflow suggested I use an sql file to load all my data while loading the image. This wont work in my case since the data is carefully being authored using another software (ArcGIS), plus the data has some complex blob fields geometries, so sql file to load the script wont work. He also suggested that I dont need to save the data as tar if im spawing containers in the same machine. Once Im satisified with my data and commit it to the image, i can load the image into a new container. Thanks for clarifying this. Still the problem is that how do I keep my database within my image so when I restore the image, the database comes with the container.

EDIT3

So I find a workaround inspired by warmoverflow suggestion, this should solve my problem. However, I'm still looking for a cleaner way to do this.

The solution is do the following:

I would still, really want the container image to have the database "in it" so when I run a container from an image, I get the database. Will be great if anyone could suggest a solution with that, will save me huge time.

Edit4 Finally Warmoverflow solved it! Answer below

Thanks

Upvotes: 2

Views: 3425

Answers (1)

Xiongbing Jin
Xiongbing Jin

Reputation: 12107

docker save is for images (saving images as tar file). What you need is docker commit which commit container changes to an image, and then save it to tar. But if your database is the same for all tests, you should build a custom image using a Dockerfile, and then run your containers using the single image.

If your data is loaded using an sql file, you can follow the instructions on "How to extend this image" section of the official postgres docker page https://hub.docker.com/_/postgres/. You can create a Dockerfile with the following content

FROM postgres
RUN mkdir -p /docker-entrypoint-initdb.d
ADD data.sql /docker-entrypoint-initdb.d/

Put your data.sql file and Dockerfile in a new folder, and run docker build -t custom_postgres ., which will build a customized image for you, and every time you run a new container with it, it will load the sql file on boot.

[Update]

Based on the new information from the question, the cause of the issue is that the official postgres image defines a VOLUME at the postgres data folder /var/lib/postgresql/data. VOLUME is used to persist data outside the container (when you use docker run -v to mount a host folder to the container), and thus any data inside the VOLUME are not saved when you commit the container itself. While this is normally a good idea, in this specific situation, we actually need data not be persistent, so that a fresh new container with the same data unmodified can be started every time.

The solution is to create your own version of the postgres image, with the VOLUME removed.

  1. The files are at https://github.com/docker-library/postgres/tree/master/9.3
  2. Download both files to a new folder
  3. Remove the VOLUME line from Dockerfile
  4. In Docker Quickstart Terminal, switch to that folder, and run docker build -t mypostgres ., which will build your own postgres image with the name mypostgres.
  5. Use docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=123456 mypostgres to start your container. The postgres db is available at postgres:[email protected]:5432
  6. Put in your data as normal using ArcGIS
  7. Commit the container with docker commit container_id_from_step_5 mypostgres_withdata. This creates your own postgres image with data.
  8. Stop and remove the intermediate container docker rm -f container_id_from_step_5
  9. Every time you need a new container, in Docker Quickstart Terminal, run docker run -d -p 5432:5432 mypostgres_withdata to start a container, and remember to stop or remove the used container afterwards so that it won't occupy the 5432 port.

Upvotes: 5

Related Questions