Reputation: 2941
I'm not 100% clear whether this post is appropriate for stack overflow, or if it should go somewhere else - please suggest where else if it shouldn't be here.
I am trying to understand how docker images work. The particular reference in this case is the Dockerfile at https://github.com/frappe/frappe_docker/blob/main/images/production/Containerfile
This file contains VOLUME directives, and some of the RUN commands in it modify the contents of the paths in the VOLUME directives.
If I pull this image from docker hub, what happens with the data in the volumes?
Is it somehow contained in the pulled image? Or are the RUN commands only executed when you start the container in docker?
What happens if you bind mount a local directory onto one of the mount points mentioned in the VOLUME directive when you run the container?
It seems, on experimenting, that bind mounting will replace the data with the contents of the local folder (which means the data created by the RUN commands gets lost).
But, if you use regular named volumes when running the container, the data is there. But I thought regular volumes persist - so what happens when you pull a later version of the container, with perhaps different data in the volume? Does it change the original volume?
[Later]
The docs at https://docs.docker.com/storage/volumes/#:~:text=If%20you%20start%20a%20container%20which%20creates%20a%20new%20volume,are%20copied%20into%20the%20volume. say:
Populate a volume using a container
If you start a container which creates a new volume, and the container has files or directories in the directory to be mounted such as /app/, the directory’s contents are copied into the volume. The container then mounts and uses the volume, and other containers which use the volume also have access to the pre-populated content.
But this does not seem to work with bind mounts if the directory already exists.
[Later] I have done some experiments to compare the behaviour of bind mount and docker volumes.
I used this Dockerfile:
FROM alpine
RUN mkdir /test \
&& echo 'echo "start `date`" >>/test/log' >/runtest \
&& echo 'echo "/log:"' >>/runtest \
&& echo 'cat /log' >>/runtest \
&& echo 'echo "/test/log:"' >>/runtest \
&& echo 'cat /test/log' >>/runtest \
&& echo 'sleep 99999999' >>/runtest \
&& chmod +x /runtest \
&& echo "build `date`" >/test/log \
&& echo "build `date`" >/log \
&& cat /runtest \
&& cat /log \
&& cat /test/log
CMD /runtest
VOLUME "/test"
This creates /test/log in a RUN command, containing the build date. The CMD (which is run every time the container starts) appends the start date to this file. The directory is made a VOLUME (note - this has to be AFTER the RUN command, because the VOLUME copies the current contents of the directory into the volume - any RUN commands after that affect the original directory, but not the volume - see https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#volume
I then ran it with the following docker-compose file:
version: "3"
services:
tester:
build: .
volumes:
- /test:/test
volumes:
test:
Build and run first time:
tester_1 | /log:
tester_1 | build Mon Feb 20 08:53:29 UTC 2023
tester_1 | /test/log:
tester_1 | build Mon Feb 20 08:53:29 UTC 2023
tester_1 | start Mon Feb 20 08:53:45 UTC 2023
Stop and run again:
tester_1 | /log:
tester_1 | build Mon Feb 20 08:54:28 UTC 2023
tester_1 | /test/log:
tester_1 | build Mon Feb 20 08:53:29 UTC 2023
tester_1 | start Mon Feb 20 08:53:45 UTC 2023
tester_1 | start Mon Feb 20 08:54:33 UTC 2023
Note /log (not in volume) is overwritten. /test/log (in volume) is not.
I replaced the volume in the docker-compose file with a bind mount.
When I built and ran it, I got a complaint Service "tester" is using volume "/test" from the previous container.
I deleted the docker volume, and tried aqain:
tester_1 | /log:
tester_1 | build Mon Feb 20 09:00:50 UTC 2023
tester_1 | /test/log:
tester_1 | start Mon Feb 20 09:01:03 UTC 2023
Note the result of the RUN command is not copied to the bind mount.
My conclusion is that, if you (or other users of your Dockefile) intent to use bind mounts, then you must not expect the contents of volumes to be copied if those contents are created within the Dockerfile (except, obviously, in the CMD directive).
Upvotes: 0
Views: 850
Reputation: 24
I found this post that answers a lot of your questions. https://www.howtogeek.com/devops/understanding-the-dockerfile-volume-instruction/
If I pull this image from docker hub, what happens with the data in the volumes? Is it somehow contained in the pulled image? Or are the RUN commands only executed when you start the container in docker?
The data is contained in the image, and a new volume with a unique id is created. It's a way to enforce persistence for containers started from the image.
What happens if you bind mount a local directory onto one of the mount points mentioned in the VOLUME directive when you run the container? It seems, on experimenting, that bind mounting will replace the data with the contents of the local folder (which means the data created by the RUN commands gets lost). But, if you use regular named volumes when running the container, the data is there.
See https://www.howtogeek.com/devops/understanding-the-dockerfile-volume-instruction/#overriding-volume-instructions-when-starting-a-container. Specifying the volume manually will override the mount point, as the VOLUME command from the build is irrelevant.
But I thought regular volumes persist - so what happens when you pull a later version of the container, with perhaps different data in the volume? Does it change the original volume?
I couldn't find much information on this, but this post may be helpful: https://stackoverflow.com/a/52762779/6658374.
When you define a VOLUME in the Dockerfile, you can only define the target, not the source of the volume. During the build, you will only get an anonymous volume from this. That anonymous volume will be mounted at every RUN command, prepopulated with the contents of the image, and then discarded at the end of the RUN command. Only changes to the container are saved, not changes to the volume.
So it seems that any changes to the image are not persisted in the volume, and thus pulling a new image would create a new volume instead of using the previous one.
Upvotes: 0