Reputation: 3397
Is it safe to run mongodump
against running server with many writes per second? Is it possible to get corrupted dump doing in this way?
From here:
Use
--oplog
to capture incoming write operations during the mongodump operation to ensure that the backups reflect a consistent data state.
Does it mean that no matter how many writes in database dump will be consistent?
If I ran mongodump --oplog
at 1AM and it finished at 2AM then I run mongorestore --oplogReplay
what state will I get?
From here:
However, the use of
mongodump
andmongorestore
as a backup strategy can be problematic for sharded clusters and replica sets.
but why? I had replica set of 1 primary and 2 secondary. What the problem to run mongodump
against one of secondary? It should same as primary (except replication lag difference).
Upvotes: 2
Views: 2179
Reputation: 37048
The docs are quite clear about it:
--oplog
Creates a file named
oplog.bson
as part of the mongodump output. Theoplog.bson
file, located in the top level of the output directory, contains oplog entries that occur during the mongodump operation. This file provides an effective point-in-time snapshot of the state of a mongod instance. To restore to a specific point-in-time backup, use the output created with this option in conjunction withmongorestore --oplogReplay
.Without
--oplog
, if there are write operations during the dump operation, the dump will not reflect a single moment in time. Changes made to the database during the update process can affect the output of the backup.
--oplog
has no effect when running mongodump against a mongos instance to dump the entire contents of a sharded cluster. However, you can use--oplog
to dump individual shards.
Without --oplog
you still get a valid dump, just a bit inconsistent - some of the writes done between 1 AM and 2 AM will be missing.
With --oplog
you have the oplog file captured at 2 AM. The dump remains inconsistent, and replaying the oplog on restore fixes this issue.
The problems dumping the sharded clusters deserve a dedicated page in the docs. Essentially because of complexity to synchronise backups of all nodes:
To create backups of a sharded cluster, you will stop the cluster balancer, take a backup of the config database, and then take backups of each shard in the cluster using mongodump to capture the backup data. To capture a more exact moment-in-time snapshot of the system, you will need to stop all application writes before taking the filesystem snapshots; otherwise the snapshot will only approximate a moment in time.
There are no problems to dump replica set.
Upvotes: 7