Reputation: 415
I have two mongo collections, one which refers to the production env and the other to the testing env.
How can I compare a diff between my two collections?
I tried dumping them to a bson and then converting to a json. But I cant just perform a simple diff on them since the sorting might vary and the json file is way too large to be sorted.
Upvotes: 34
Views: 51998
Reputation: 61666
Starting Mongo 4.4
, the aggregation framework provides a new $unionWith
stage, performing the union of two collections (the combined pipeline results from two collections into a single result set).
Making it way easier to find the diff between two collections:
// > db.test.find()
// { "a" : 9, "b" : 2 }
// { "a" : 4, "b" : 12 }
// { "a" : 3, "b" : 5 }
// { "a" : 0, "b" : 7 }
// { "a" : 7, "b" : 12 }
// > db.prod.find()
// { "a" : 3, "b" : 5 }
// { "a" : 4, "b" : 12 }
// { "a" : 3, "b" : 5 }
// { "a" : 0, "b" : 7 }
db.test.aggregate(
{ $unset: "_id" },
{ $project: { from: "test", doc: "$$ROOT" } },
{ $unionWith: {
coll: "prod",
pipeline: [
{ $unset: "_id" },
{ $project: { from: "prod", doc: "$$ROOT" } }
]
}},
{ $group: {
_id: "$doc",
test: { $sum: { $cond: [ { $eq: ["$from", "test"] }, 1, 0 ] } },
prod: { $sum: { $cond: [ { $eq: ["$from", "prod"] }, 1, 0 ] } }
}},
{ $match: { $expr: { $ne: ["$test", "$prod"] } } }
)
// { "_id" : { "a" : 7, "b" : 12 }, "test" : 1, "prod" : 0 }
// { "_id" : { "a" : 9, "b" : 2 }, "test" : 1, "prod" : 0 }
// { "_id" : { "a" : 3, "b" : 5 }, "test" : 1, "prod" : 2 }
This:
$unset
s the _id
in order to latter be able to $group
documents by themselves without considering the _id
(as it might be different in the other collection).$project
s the field from
whose value is the collection the document comes from (test
or prod
), in order, latter when we merge the two collections, to keep track from where documents originated from.$project
s the field doc
whose value is the document itself (thanks to the $$ROOT
variable). This is the field that will be used to $group
documents together.$unionWith
the prod
collection in order to merge documents from both collections into the same aggregation pipeline. The pipeline
parameter is an optional aggregation pipeline applied on documents from the collection being merged (prod
) before documents are inserted into the downstream pipeline. And we're applying the same $unset
/$project
stages we've applied on test
documents.$group
s test
and prod
documents based on the doc
field that we've created to represent the actual document. And we accumulate the two fields test
and prod
as the $sum
(count) of grouped documents originating from one or the other collection (via $cond
if expressions).$match
es resulting grouped elements by only keeping items not having the same number of test and prod documents: the actual diff between the two collections.Upvotes: 13
Reputation: 10794
mongoexport
now has a --sort
option:
For example:
$ mongo
test> db.coll.insertMany([
{ _id: 0, name: 'Alex' },
{ _id: 1, name: 'Bart' },
{ _id: 2, name: 'Maria' },
{ _id: 3, name: 'Aristotle' },
]);
{
"acknowledged": true,
"insertedIds": [
0,
1,
2,
3
]
}
Export it:
mongoexport -d test -c coll --sort "{name: 1}"
2018-10-25T15:50:07.210+0300 connected to: localhost
{"_id":0.0,"name":"Alex"}
{"_id":3.0,"name":"Aristotle"}
{"_id":1.0,"name":"Bart"}
{"_id":2.0,"name":"Maria"}
2018-10-25T15:50:07.210+0300 exported 4 records
mongoexport -d test -c coll --sort "{name: -1}"
2018-10-25T15:49:42.010+0300 connected to: localhost
{"_id":2.0,"name":"Maria"}
{"_id":1.0,"name":"Bart"}
{"_id":3.0,"name":"Aristotle"}
{"_id":0.0,"name":"Alex"}
2018-10-25T15:49:42.011+0300 exported 4 records
After you exported the collections you can perform a diff
in the command line or use a graphical user interface (GUI) tool like Beyond Compare.
FYI: if your documents might have the same data but have different _id
values:
You can exclude the _id
field when exporting like this: https://stackoverflow.com/a/49895549/728287
Upvotes: 1
Reputation: 883
Using the Kevin Smith response, I have a new version, only for compare and return those ID's who the collectionB
don't have comparing with the collectionA
. And save the result in a collectionC
when you have lots records.
db.collectionA.find().forEach(function(doc1){
var doc2 = db.collectionB.findOne({_id: doc1._id});
if (!(doc2)) {
db.collectionC.insert(doc1);
}
});
Upvotes: 10
Reputation: 17339
If you need to compare only a subset of fields (e.g. you don't need to compare ids), you can do it the following way. Export the collections to csv, specifying the fields to compare (source):
mongoexport -d <db_name> -c <col_name> --fields "field1,field2" --type=csv | sort > export.csv
And then do a simple diff
on the csv files. Note that column order in the csv file corresponds to the --field
option.
Pros:
Cons:
mongoexport
can be slow for huge databases.To get all fields in all documents in a collection, see this answer.
Upvotes: 5
Reputation: 7982
The dbHash has done the trick:
use db_name
db.runCommand('dbHash')
It returns the hash values for each collection. You can then compare them. It's pretty accurate.
Upvotes: 11
Reputation: 83
use Studio 3T to compare mongodb. you can compare collections,db,single records as well. just you need to download and connect mongo. here is download link https://studio3t.com/
Upvotes: 0
Reputation: 14436
Try the following in the shell, it will iterate each item within a collection and try to match each document based on ID.
Say we have 2 collections db.col1
and db.col2
:
> db.col1.find()
{ "_id" : 1, "item" : 1 }
{ "_id" : 2, "item" : 2 }
{ "_id" : 3, "item" : 3 }
{ "_id" : 4, "item" : 4 }
> db.col2.find()
{ "_id" : 1, "item" : 1 }
{ "_id" : 2, "item" : 2 }
{ "_id" : 3, "item" : 3 }
{ "_id" : 4, "item" : 4 }
We can then create a javascript function to compare 2 collections
function compareCollection(col1, col2){
if(col1.count() !== col2.count()){
return false;
}
var same = true;
var compared = col1.find().forEach(function(doc1){
var doc2 = col2.findOne({_id: doc1._id});
same = same && JSON.stringify(doc1)==JSON.stringify(doc2);
});
return same;
}
Then call is like the following:
> compareCollection(db.col1, db.col2)
true
If we then have a 3rd collections db.col3
> db.col3.find()
{ "_id" : 1, "item" : 1 }
And compare this one
> compareCollection(db.col1, db.col3)
false
we'll get the expected result.
If we also have a 4th collection which has matching documents but diffrent data db.col4
> db.col4.find()
{ "_id" : 1, "item" : 10 }
{ "_id" : 2, "item" : 2 }
{ "_id" : 3, "item" : 3 }
{ "_id" : 4, "item" : 4 }
This will also return false
> compareCollection(db.col1, db.col4)
false
Upvotes: 17