Reputation: 3191
I'm just looking for the simplest equivalent of what is normally taking the shasum of the contents of two files. I don't want to compare each and every item through an eval
function as this post suggests:
How to compare 2 mongodb collections?
The Sharding/Replication functionality of Mongodb must already have an efficient method built-in for this, but I don't see a doc entry on how one would access this for the purpose of comparison.
Upvotes: 0
Views: 2628
Reputation: 65333
The Sharding/Replication functionality of Mongodb must already have an efficient method built-in for this, but I don't see a doc entry on how one would access this for the purpose of comparison.
Replication relies on an idempotent operation log (oplog) and does not have to calculate checksums on collections. The MongoDB manual goes into some further detail on replication synchronization including initial sync and ongoing replication behaviour.
There is an internal dbHash
command used by sharding to calculate the md5 checksum of all collections in a database. It is intended to verify consistency of collections in the sharded cluster config databases, but can be used to calculate md5 sums for any database.
NOTE: the dbHash
command is explicitly not part of the stable client-facing API, so the implementation/API is subject to change without notice between MongoDB releases.
Since this command iterates all of the documents in all collections for the selected database you will want to exercise caution when running on large databases or a production database (especially if your data set is greater than RAM). Reading a large number of documents can result in temporarily paging out useful data/indexes from your in-memory working set.
The command output will look similar to the following:
> db.runCommand('dbHash')
{
"numCollections": 2,
"host": "nibbler.local",
"collections": {
"foo1": "adf5db735ce0ac74c35a561675614676",
"foo2": "adf5db735ce0ac74c35a561675614676",
"foo3": "26e01a5da467064790a61108170a3b5c"
},
"md5": "f8f53a3cde773a61f5b4ccf4c3d99e07",
"timeMillis": 0,
"fromCache": [ ],
"ok": 1
}
In this example foo1
and foo2
collections have identical md5 sums, while foo3
differs.
Upvotes: 8