Comparing two unsorted collections in MongoDB

Question

I am trying to compare a large number of documents in two collections. To give you an estimate, I have around 1300 documents in each of the two collections.

I want to generate a diff comparison report after comparing the two collections. I do not need to point out exactly what is missing or what new content has been added, I just need to be able to identify that there is in fact some difference between the two documents. Yes, I do have a unique identifier for each documents other than Mongo's ObjectId ("_id").

Note: I have implemented the database using the denormalized data model, which means I have embedded documents (documents within documents).

What would you say is the best way to go about implementing a solution for the same?

Thank you in advance for your time samaritans!

Tom Slabbaert · Accepted Answer

You should use $lookup and $eq on all the fields you care about.

db.collection1.aggregate([
   {
      $lookup:
         {
           from: "collection2",
           let: { unique_id: "$unique_id", field1: "$field", field2: "$field", ... },
           pipeline: [
              { $match:
                 { $expr:
                    { $and:
                       [
                         { $eq: [ "$unique_id_in_2",  "$$unique_id" ] }
                         { $eq: [ "$field_to_match",  "$$field1" ] },
                         { $eq: [ "$field_to_match.2",  "$$field2" ] }
                       ]
                    }
                 }
              },
           ],
           as: "matches"
         }
    },
   {
     $match: {
         'matches.0': {$exists: false}
      }
   }
])

** mongo 3.6+ syntax for lookup.

Comparing two unsorted collections in MongoDB

Answers (1)

Related Questions