Reputation: 4113
Migrating databases in MongoDB is a pretty well understood problem domain and there are a range of tools available to do so on a host-level. Everything from mongodump
and mongoexport
to rsync
on the data files. If you're getting very fancy, you can use network mounts like SSHFS and NFS to mitigate diskspace and IOPS constraint problems.
Migrating a Database on a Host
# Using a temporary archive
mongodump --db my_db --gzip --archive /tmp/my_db.dump --port 27017
mongorestore --db my_db --gzip --archive /tmp/my_db.dump --port 27018
rm /tmp/my_db.dump
# Or you can stream it...
mongodump --db my_db --port 27017 --archive \
| mongorestore --db my_db --port 27018 --archive
Performing the same migrations in a containerized environment, however, can be somewhat more complicated and the lightweight, purpose-specific nature of containers means that you often don't have the same set of tools available to you.
As an engineer managing containerized infrastructure, I'm interested in what approaches can be used to migrate a database from one container/cluster to another whether for backup, cluster migration or development (data sampling) purposes.
For the purpose of this question, let's assume that the database is NOT a multi-TB cluster spread across multiple hosts and seeing thousands(++) of writes per second (i.e. that you can make a backup and have "enough" data for it to be valuable without needing to worry about replicating oplogs etc).
Upvotes: 5
Views: 9073
Reputation: 4113
I've used a couple of approaches to solve this before. The specific approach depends on what I'm doing and what requirements I need to work within.
# Dump the old container's DB to an archive file within the container
docker exec $OLD_CONTAINER \
bash -c 'mongodump --db my_db --gzip --archive /tmp/my_db.dump'
# Copy the archive from the old container to the new one
docker cp $OLD_CONTAINER:/tmp/my_db.dump $NEW_CONTAINER:/tmp/my_db.dump
# Restore the archive in the new container
docker exec $NEW_CONTAINER \
bash -c 'mongorestore --db my_db --gzip --archive /tmp/my_db.dump'
This approach works quite well and avoids many encoding issues encountered when piping data over stdout
, however it also doesn't work particularly well when migrating to containers on different hosts (you need to docker cp
to a local file and then repeat the process to copy that local file to the new host) as well as when migrating from, say, Docker to Kubernetes.
Migrating to a different Docker cluster
# Dump the old container's DB to an archive file within the container
docker -H old_cluster exec $OLD_CONTAINER \
bash -c 'mongodump --db my_db --gzip --archive /tmp/my_db.dump'
docker -H old_cluster exec $OLD_CONTAINER rm /tmp/my_db.dump
# Copy the archive from the old container to the new one (via your machine)
docker -H old_cluster cp $OLD_CONTAINER:/tmp/my_db.dump /tmp/my_db.dump
docker -H new_cluster cp /tmp/my_db.dump $NEW_CONTAINER:/tmp/my_db.dump
rm /tmp/my_db.dump
# Restore the archive in the new container
docker -H new_cluster exec $NEW_CONTAINER \
bash -c 'mongorestore --db my_db --gzip --archive /tmp/my_db.dump'
docker -H new_cluster exec $NEW_CONTAINER rm /tmp/my_db.dump
Downsides
The biggest downside to this approach is the need to store temporary dump files everywhere. In the base case scenario, you would have a dump file in your old container and another in your new container; in the worst case you'd have a 3rd on your local machine (or potentially on multiple machines if you need to scp
/rsync
it around). These temp files are likely to be forgotten about, wasting unnecessary space and cluttering your container's filesystem.
# Copy the database over stdout (base64 encoded)
docker exec $OLD_CONTAINER \
bash -c 'mongodump --db my_db --gzip --archive 2>/dev/null | base64' \
| docker exec $NEW_CONTAINER \
bash -c 'base64 --decode | mongorestore --db my_db --gzip --archive'
Copying the archive over stdout
and passing it via stdin
to the new container allows you to remove the copy step and join the commands into a beautiful little one liner (for some definition of beautiful). It also allows you to potentially mix-and-match hosts and even container schedulers...
Migrating between different Docker clusters
# Copy the database over stdout (base64 encoded)
docker -H old_cluster exec $(docker -H old_cluster ps -q -f 'name=mongo') \
bash -c 'mongodump --db my_db --gzip --archive 2>/dev/null | base64' \
| docker -H new_cluster exec $(docker -H new_cluster ps -q -f 'name=mongo') \
bash -c 'base64 --decode | mongorestore --db my_db --gzip --archive'
Migrating from Docker to Kubernetes
# Copy the database over stdout (base64 encoded)
docker exec $(docker ps -q -f 'name=mongo') \
bash -c 'mongodump --db my_db --gzip --archive 2>/dev/null | base64' \
| kubectl exec mongodb-0 \
bash -c 'base64 --decode | mongorestore --db my_db --gzip --archive'
Downsides
This approach works well in the "success" case, but in situations where it fails to dump the database correctly the need to suppress the stderr
stream (with 2>/dev/null
) can cause serious headaches for debugging the cause.
It is also 33% less network efficient than the file case, since it needs to base64
encode the data for transport (potentially a big issue for larger databases). As with all streaming modes, there's also no way to inspect the data that was sent after the fact, which might be an issue if you need to track down an issue.
Upvotes: 8