Mongodump while writing

Question

Is it safe to run mongodump against running server with many writes per second? Is it possible to get corrupted dump doing in this way?

From here:

Use --oplog to capture incoming write operations during the mongodump operation to ensure that the backups reflect a consistent data state.

Does it mean that no matter how many writes in database dump will be consistent?

If I ran mongodump --oplog at 1AM and it finished at 2AM then I run mongorestore --oplogReplay what state will I get?

From here:

However, the use of mongodump and mongorestore as a backup strategy can be problematic for sharded clusters and replica sets.

but why? I had replica set of 1 primary and 2 secondary. What the problem to run mongodump against one of secondary? It should same as primary (except replication lag difference).

Alex Blex · Accepted Answer

The docs are quite clear about it:

--oplog

Creates a file named oplog.bson as part of the mongodump output. The oplog.bson file, located in the top level of the output directory, contains oplog entries that occur during the mongodump operation. This file provides an effective point-in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction with mongorestore --oplogReplay.

Without --oplog, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup.

--oplog has no effect when running mongodump against a mongos instance to dump the entire contents of a sharded cluster. However, you can use --oplog to dump individual shards.

Without --oplog you still get a valid dump, just a bit inconsistent - some of the writes done between 1 AM and 2 AM will be missing.

With --oplog you have the oplog file captured at 2 AM. The dump remains inconsistent, and replaying the oplog on restore fixes this issue.

The problems dumping the sharded clusters deserve a dedicated page in the docs. Essentially because of complexity to synchronise backups of all nodes:

To create backups of a sharded cluster, you will stop the cluster balancer, take a backup of the config database, and then take backups of each shard in the cluster using mongodump to capture the backup data. To capture a more exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the filesystem snapshots; otherwise the snapshot will only approximate a moment in time.

There are no problems to dump replica set.

Mongodump while writing

Answers (1)

Related Questions