Karan Jain
Karan Jain

Reputation: 415

Compare a mongo diff on two collections

I have two mongo collections, one which refers to the production env and the other to the testing env.

How can I compare a diff between my two collections?

I tried dumping them to a bson and then converting to a json. But I cant just perform a simple diff on them since the sorting might vary and the json file is way too large to be sorted.

Upvotes: 34

Views: 51998

Answers (7)

Xavier Guihot
Xavier Guihot

Reputation: 61666

Starting Mongo 4.4, the aggregation framework provides a new $unionWith stage, performing the union of two collections (the combined pipeline results from two collections into a single result set).

Making it way easier to find the diff between two collections:

// > db.test.find()
//    { "a" : 9, "b" : 2  }
//    { "a" : 4, "b" : 12 }
//    { "a" : 3, "b" : 5  }
//    { "a" : 0, "b" : 7  }
//    { "a" : 7, "b" : 12 }
// > db.prod.find()
//    { "a" : 3, "b" : 5  }
//    { "a" : 4, "b" : 12 }
//    { "a" : 3, "b" : 5  }
//    { "a" : 0, "b" : 7  }
db.test.aggregate(
  { $unset: "_id" },
  { $project: { from: "test", doc: "$$ROOT" } },
  { $unionWith: {
      coll: "prod",
      pipeline: [
        { $unset: "_id" },
        { $project: { from: "prod", doc: "$$ROOT" } }
      ]
  }},
  { $group: {
      _id: "$doc",
      test: { $sum: { $cond: [ { $eq: ["$from", "test"] }, 1, 0 ] } },
      prod: { $sum: { $cond: [ { $eq: ["$from", "prod"] }, 1, 0 ] } }
  }},
  { $match: { $expr: { $ne: ["$test", "$prod"] } } }
)
// { "_id" : { "a" : 7, "b" : 12 }, "test" : 1, "prod" : 0 }
// { "_id" : { "a" : 9, "b" : 2  }, "test" : 1, "prod" : 0 }
// { "_id" : { "a" : 3, "b" : 5  }, "test" : 1, "prod" : 2 }

This:

  • $unsets the _id in order to latter be able to $group documents by themselves without considering the _id (as it might be different in the other collection).
  • $projects the field from whose value is the collection the document comes from (test or prod), in order, latter when we merge the two collections, to keep track from where documents originated from.
  • Also $projects the field doc whose value is the document itself (thanks to the $$ROOT variable). This is the field that will be used to $group documents together.
  • $unionWith the prod collection in order to merge documents from both collections into the same aggregation pipeline. The pipeline parameter is an optional aggregation pipeline applied on documents from the collection being merged (prod) before documents are inserted into the downstream pipeline. And we're applying the same $unset/$project stages we've applied on test documents.
  • $groups test and prod documents based on the doc field that we've created to represent the actual document. And we accumulate the two fields test and prod as the $sum (count) of grouped documents originating from one or the other collection (via $cond if expressions).
  • $matches resulting grouped elements by only keeping items not having the same number of test and prod documents: the actual diff between the two collections.

Upvotes: 13

Gianfranco P
Gianfranco P

Reputation: 10794

mongoexport now has a --sort option:

For example:

$ mongo
test> db.coll.insertMany([
  { _id: 0, name: 'Alex' },
  { _id: 1, name: 'Bart' },
  { _id: 2, name: 'Maria' },
  { _id: 3, name: 'Aristotle' },
]);
{
  "acknowledged": true,
  "insertedIds": [
    0,
    1,
    2,
    3
  ]
}

Export it:

mongoexport -d test -c coll --sort "{name: 1}"
2018-10-25T15:50:07.210+0300    connected to: localhost
{"_id":0.0,"name":"Alex"}
{"_id":3.0,"name":"Aristotle"}
{"_id":1.0,"name":"Bart"}
{"_id":2.0,"name":"Maria"}
2018-10-25T15:50:07.210+0300    exported 4 records

mongoexport -d test -c coll --sort "{name: -1}"
2018-10-25T15:49:42.010+0300    connected to: localhost
{"_id":2.0,"name":"Maria"}
{"_id":1.0,"name":"Bart"}
{"_id":3.0,"name":"Aristotle"}
{"_id":0.0,"name":"Alex"}
2018-10-25T15:49:42.011+0300    exported 4 records

After you exported the collections you can perform a diff in the command line or use a graphical user interface (GUI) tool like Beyond Compare.

FYI: if your documents might have the same data but have different _id values: You can exclude the _id field when exporting like this: https://stackoverflow.com/a/49895549/728287

Upvotes: 1

Carlos Rodriguez
Carlos Rodriguez

Reputation: 883

Using the Kevin Smith response, I have a new version, only for compare and return those ID's who the collectionB don't have comparing with the collectionA. And save the result in a collectionC when you have lots records.

    db.collectionA.find().forEach(function(doc1){
        var doc2 = db.collectionB.findOne({_id: doc1._id});
        if (!(doc2)) {
                db.collectionC.insert(doc1);
        }
    });

Upvotes: 10

Dennis Golomazov
Dennis Golomazov

Reputation: 17339

If you need to compare only a subset of fields (e.g. you don't need to compare ids), you can do it the following way. Export the collections to csv, specifying the fields to compare (source):

mongoexport -d <db_name> -c <col_name> --fields "field1,field2" --type=csv | sort > export.csv

And then do a simple diff on the csv files. Note that column order in the csv file corresponds to the --field option.

Pros:

  • you can specify subset of fields to compare.
  • you can see the actual diff of the records.

Cons:

  • to compare the full records, you need to know all possible fields.
  • mongoexport can be slow for huge databases.

To get all fields in all documents in a collection, see this answer.

Upvotes: 5

tsveti_iko
tsveti_iko

Reputation: 7982

The dbHash has done the trick:

use db_name
db.runCommand('dbHash')

It returns the hash values for each collection. You can then compare them. It's pretty accurate.

Upvotes: 11

Jigar Prajapati
Jigar Prajapati

Reputation: 83

use Studio 3T to compare mongodb. you can compare collections,db,single records as well. just you need to download and connect mongo. here is download link https://studio3t.com/

Upvotes: 0

Kevin Smith
Kevin Smith

Reputation: 14436

Try the following in the shell, it will iterate each item within a collection and try to match each document based on ID.

Say we have 2 collections db.col1 and db.col2:

> db.col1.find()
{ "_id" : 1, "item" : 1 }
{ "_id" : 2, "item" : 2 }
{ "_id" : 3, "item" : 3 }
{ "_id" : 4, "item" : 4 }

> db.col2.find()
{ "_id" : 1, "item" : 1 }
{ "_id" : 2, "item" : 2 }
{ "_id" : 3, "item" : 3 }
{ "_id" : 4, "item" : 4 }

We can then create a javascript function to compare 2 collections

function compareCollection(col1, col2){
    if(col1.count() !== col2.count()){
        return false;
    }

    var same = true;

    var compared = col1.find().forEach(function(doc1){
        var doc2 = col2.findOne({_id: doc1._id});

        same = same && JSON.stringify(doc1)==JSON.stringify(doc2);
    });

    return same;
}

Then call is like the following:

> compareCollection(db.col1, db.col2)
true

If we then have a 3rd collections db.col3

> db.col3.find()
{ "_id" : 1, "item" : 1 }

And compare this one

> compareCollection(db.col1, db.col3)
false

we'll get the expected result.

If we also have a 4th collection which has matching documents but diffrent data db.col4

> db.col4.find()
{ "_id" : 1, "item" : 10 }
{ "_id" : 2, "item" : 2 }
{ "_id" : 3, "item" : 3 }
{ "_id" : 4, "item" : 4 }

This will also return false

> compareCollection(db.col1, db.col4)
false

Upvotes: 17

Related Questions