Reputation: 18790
Under the background of https://github.com/docker-library/postgres (github repo) and https://registry.hub.docker.com/_/postgres/ (docker hub)
It can be seen database is started by Entrypoint and CMD with bash script
/docker-entrypoint.sh
with
ENTRYPOINT ["/docker-entrypoint.sh"]
EXPOSE 5432
CMD ["postgres"]
another script hook provided to change database is
/docker-entrypoint-initdb.d
which means the database starts (can be pqsl) only at runtime, when docker run command is typed in.
This causes a problem, we could not customize the database before it runs in build time, for example add extensions and populate db with data.
Of course, it could be done in run time. But it has the advantage to repeat the operation every time when the image is run.
So, what is the logic behind this design from docker or postgres perspective? How could I add extension and populate data in build time ?
Upvotes: 2
Views: 836
Reputation: 103965
If you were to customize (create, populate data) a database at build time, that would imply that the database data is written into the docker image filesystem itself (as one cannot mount a volume at build time).
The issue with that is that the docker image filesystem is a special one (AUFS or btrfs, etc) which isn't delivering good I/O performances for data intensive applications such as a database server.
As a consequence, you want to have your data written on a volume instead of on the docker container filesystem. As you don't know at build time what would be the volume used at run time, and as there is no mean anyway to mount volumes at build time, no one should create database at build time.
Furthermore, if you take a close look at the Dockerfile of the official PostgreSQL image, you will see that there is a VOLUME instruction that makes the path at which the data is written a volume. That means that the image is designed so that the data will never hit the docker container filesystem.
If you take a look at other Dockerfiles for other databases or data intensive applications, you will notice that they all operate in this manner. An other reason for that is that it is accepted as a good practice to make your docker containers immutable.
If you want to install additional modules to your image, it is fine as long as those do not depend on data that would be written on a volume, and as long as you make sure to declare a volume for any path they would write data on.
Application code/binary → docker image filesystem
Application data → docker volume
Upvotes: 3
Reputation: 2402
This is right from the docker page for the postgres image (library/postgres):
If you would like to do additional initialization in an image derived from this one, add a *.sql or *.sh
script under /docker-entrypoint-initdb.d
(creating the directory if necessary). After the entrypoint calls initdb
to create the default postgres
user and database, it will run any *.sql
files and source any *.sh
script found in that directory to do further initialization before starting the service.
You can also extend the image with a simple Dockerfile
to set the locale. The following example will set the default locale to de_DE.utf8
:
FROM postgres:9.4
RUN localedef -i de_DE -c -f UTF-8 -A /usr/share/locale/locale.alias de_DE.UTF-8
ENV LANG de_DE.utf8
Since database initialization only happens on container startup, this allows us to set the language before it is created.
You have the ability to extend an image just as the example shows from the docs that I pasted above. You can also use the exec
command and execute virtually anything within the container right from your host machine. It took me a little while to get used to it, I continue to discover things as I play with it more and more.
UPDATE:
sudo docker run --name some-postgres -v ~/PATH/TO/some-postgres/data:/var/lib/postgres/data -p 127.0.0.1:5432:5432 -e POSTGRES_PASSWORD=test -d postgres
Upvotes: 1