Reputation: 14704
Docker's documentation states that volumes can be "migrated" - which I'm assuming means that I should be able to move a volume from one host to another host. (More than happy to be corrected on this point.) However, the same documentation page doesn't provide information on how to do this.
Digging around on SO, I have found an older question (circa 2015-ish) that states that this is not possible, but given that it's 2 years on, I thought I'd ask again.
In case it helps, I'm developing a Flask app that uses [TinyDB] + local disk as its data storage - I have determined that I didn't need anything more fancy than that; this is a project done for learning at the moment, so I've decided to go extremely lightweight. The project is structured as such:
/project_directory
|- /app
|- __init__.py
|- ...
|- run.py # assumes `data/databases/ and data/files/` are present
|- Dockerfile
|- data/
|- databases/
|- db1.json
|- db2.json
|- files/
|- file1.pdf
|- file2.pdf
I have the folder data/*
inside my .dockerignore
and .gitignore
, so that they are not placed under version control and are ignored by Docker when building the images.
While developing the app, I am also trying to work with database entries and PDFs that are as close to real-world as possible, so I seeded the app with a very small subset of real data, that are stored on a volume that is mounted directly into data/
when the Docker container is instantiated.
What I want to do is deploy the container on a remote host, but have the remote host seeded with the starter data (ideally, this would be the volume that I've been using locally, for maximal convenience); later on as more data are added on the remote host, I'd like to be able to pull that back down so that during development I'm working with up-to-date data that my end users have entered.
Looking around, the "hacky" way I'm thinking of doing is simply using rsync
, which might work out just fine. However, if there's a solution I'm missing, I'd greatly appreciate guidance!
Upvotes: 20
Views: 32782
Reputation: 6033
According the Docker docs you could also create a Backup and Restore it:
Backup volume
docker run --rm --volumes-from CONTAINER -v \
$(pwd):/backup ubuntu tar cvf /backup/backup.tar /MOUNT_POINT_OF_VOLUME
Restore volume from backup on another host
docker run --rm --volumes-from CONTAINER -v \
$(pwd):/LOCAL_FOLDER ubuntu bash -c "cd /MOUNT_POINT_OF_VOLUME && \
tar xvf /backup/backup.tar --strip 1"
OR (what I prefer) just copy it to local storage
docker cp --archive CONTAINER:/MOUNT_POINT_OF_VOLUME ./LOCAL_FOLDER
then copy it to the other host and start with e.g.
docker run -v ./LOCAL_FOLDER:/MOUNT_POINT_OF_VOLUME some_image
Upvotes: 16
Reputation: 3481
you can use this trick :
docker run --rm -v <SOURCE_DATA_VOLUME_NAME>:/from alpine ash -c "cd /from ; tar -cf - . " | ssh <TARGET_HOST> 'docker run --rm -i -v <TARGET_DATA_VOLUME_NAME>:/to alpine ash -c "cd /to ; tar -xpvf - " '
Upvotes: 5
Reputation: 5066
The way I would approach this is to generate a Docker container that stores a copy of the data you want to seed your development environment with. You can then expose the data in that container as a volume, and finally mount that volume into your development containers. I'll demonstrate with an example:
Creating the Data Container
Firstly we're just going to create a Docker container that contains your seed data and nothing else. I'd create a Dockerfile
at ~/data/Dockerfile
and give it the following content:
FROM alpine:3.4
ADD . /data
VOLUME /data
CMD /bin/true
You could then build this with:
docker build -t myproject/my-seed-data .
This will create you a Docker image tagged as myproject/my-seed-data:latest
. The image simply contains all of the data you want to seed the environment with, stored at /data
within the image. Whenever we create an instance of the image as a container, it will expose all of the files within /data
as a volume.
Mounting the volume into another Docker container
I imagine you're running your Docker container something like this:
docker run -d -v $(pwd)/data:/data your-container-image <start_up_command>
You could now extend that to do the following:
docker run -d --name seed-data myproject/my-seed-data
docker run -d --volumes-from seed-data your-container-image <start_up_command>
What we're doing here is first creating an instance of your seed data container. We're then creating an instance of the development container and mounting the volumes from the data container into it. This means that you'll get the seed data at /data
within your development container.
This gets a little bit of a pain that you know need to run two commands, so we could go ahead and orchestrate it a bit better with something like Docker Compose
Simple Orchestration with Docker Compose
Docker Compose is a way of running more than one container at the same time. You can declare what your environment needs to look like and do things like define:
"My development container depends on an instance of my seed data container"
You create a docker-compose.yml
file to layout what you need. It would look something like this:
version: 2
services:
seed-data:
image: myproject/my-seed-data:latest
my_app:
build: .
volumes_from:
- seed-data
depends_on:
- seed-data
You can then start all containers at once using docker-compose up -d my_app
. Docker Compose is smart enough to firstly start an instance of your data container, and then finally your app container.
Sharing the Data Container between hosts
The easiest way to do this is to push your data container as an image to Docker Hub. Once you have built the image, it can be pushed to Docker Hub as follows:
docker push myproject/my-seed-data:latest
It's very similar in concept to pushing a Git commit to a remote repository, instead in this case you're pushing a Docker image. What this does mean however is that any environment can now pull this image and use the data contained within it. That means you can re-generate the data image when you have new seed data, push it to Docker Hub under the :latest
tag and when you re-start your dev environment will have the latest data.
To me this is the "Docker" way of sharing data and it keeps things portable between Docker environments. You can also do things like have your data container generated on a regular basis by a job within a CI environment like Jenkins.
Upvotes: 11