Benjamin Pannell
Benjamin Pannell

Reputation: 4113

How to migrate a MongoDB database between Docker containers?

Migrating databases in MongoDB is a pretty well understood problem domain and there are a range of tools available to do so on a host-level. Everything from mongodump and mongoexport to rsync on the data files. If you're getting very fancy, you can use network mounts like SSHFS and NFS to mitigate diskspace and IOPS constraint problems.

Migrating a Database on a Host

# Using a temporary archive
mongodump    --db my_db --gzip --archive /tmp/my_db.dump --port 27017
mongorestore --db my_db --gzip --archive /tmp/my_db.dump --port 27018
rm /tmp/my_db.dump

# Or you can stream it...
mongodump      --db my_db --port 27017 --archive \
| mongorestore --db my_db --port 27018 --archive

Performing the same migrations in a containerized environment, however, can be somewhat more complicated and the lightweight, purpose-specific nature of containers means that you often don't have the same set of tools available to you.

As an engineer managing containerized infrastructure, I'm interested in what approaches can be used to migrate a database from one container/cluster to another whether for backup, cluster migration or development (data sampling) purposes.

For the purpose of this question, let's assume that the database is NOT a multi-TB cluster spread across multiple hosts and seeing thousands(++) of writes per second (i.e. that you can make a backup and have "enough" data for it to be valuable without needing to worry about replicating oplogs etc).

Upvotes: 5

Views: 9073

Answers (1)

Benjamin Pannell
Benjamin Pannell

Reputation: 4113

I've used a couple of approaches to solve this before. The specific approach depends on what I'm doing and what requirements I need to work within.

1. Working with files inside the container

# Dump the old container's DB to an archive file within the container
docker exec $OLD_CONTAINER \
     bash -c 'mongodump --db my_db --gzip --archive /tmp/my_db.dump'

# Copy the archive from the old container to the new one
docker cp $OLD_CONTAINER:/tmp/my_db.dump $NEW_CONTAINER:/tmp/my_db.dump

# Restore the archive in the new container
docker exec $NEW_CONTAINER \
     bash -c 'mongorestore --db my_db --gzip --archive /tmp/my_db.dump'

This approach works quite well and avoids many encoding issues encountered when piping data over stdout, however it also doesn't work particularly well when migrating to containers on different hosts (you need to docker cp to a local file and then repeat the process to copy that local file to the new host) as well as when migrating from, say, Docker to Kubernetes.

Migrating to a different Docker cluster

# Dump the old container's DB to an archive file within the container
docker -H old_cluster exec $OLD_CONTAINER \
     bash -c 'mongodump --db my_db --gzip --archive /tmp/my_db.dump'
docker -H old_cluster exec $OLD_CONTAINER rm /tmp/my_db.dump

# Copy the archive from the old container to the new one (via your machine)
docker -H old_cluster cp $OLD_CONTAINER:/tmp/my_db.dump /tmp/my_db.dump
docker -H new_cluster cp /tmp/my_db.dump $NEW_CONTAINER:/tmp/my_db.dump
rm /tmp/my_db.dump

# Restore the archive in the new container
docker -H new_cluster exec $NEW_CONTAINER \
     bash -c 'mongorestore --db my_db --gzip --archive /tmp/my_db.dump'
docker -H new_cluster exec $NEW_CONTAINER rm /tmp/my_db.dump

Downsides

The biggest downside to this approach is the need to store temporary dump files everywhere. In the base case scenario, you would have a dump file in your old container and another in your new container; in the worst case you'd have a 3rd on your local machine (or potentially on multiple machines if you need to scp/rsync it around). These temp files are likely to be forgotten about, wasting unnecessary space and cluttering your container's filesystem.

2. Copying over stdout

# Copy the database over stdout (base64 encoded)
docker exec $OLD_CONTAINER \
     bash -c 'mongodump --db my_db --gzip --archive 2>/dev/null | base64' \
| docker exec $NEW_CONTAINER \
     bash -c 'base64 --decode | mongorestore --db my_db --gzip --archive'

Copying the archive over stdout and passing it via stdin to the new container allows you to remove the copy step and join the commands into a beautiful little one liner (for some definition of beautiful). It also allows you to potentially mix-and-match hosts and even container schedulers...

Migrating between different Docker clusters

# Copy the database over stdout (base64 encoded)
docker -H old_cluster exec $(docker -H old_cluster ps -q -f 'name=mongo') \
    bash -c 'mongodump --db my_db --gzip --archive 2>/dev/null | base64' \
| docker -H new_cluster exec $(docker -H new_cluster ps -q -f 'name=mongo') \
    bash -c 'base64 --decode | mongorestore --db my_db --gzip --archive'

Migrating from Docker to Kubernetes

# Copy the database over stdout (base64 encoded)
docker exec $(docker ps -q -f 'name=mongo') \
    bash -c 'mongodump --db my_db --gzip --archive 2>/dev/null | base64' \
| kubectl exec mongodb-0 \
    bash -c 'base64 --decode | mongorestore --db my_db --gzip --archive'

Downsides

This approach works well in the "success" case, but in situations where it fails to dump the database correctly the need to suppress the stderr stream (with 2>/dev/null) can cause serious headaches for debugging the cause.

It is also 33% less network efficient than the file case, since it needs to base64 encode the data for transport (potentially a big issue for larger databases). As with all streaming modes, there's also no way to inspect the data that was sent after the fact, which might be an issue if you need to track down an issue.

Upvotes: 8

Related Questions