Remove duplications from Mongodb collection on multiple key

Question

I have a mongodb collection. I want to remove duplicate docs if two key fields are duplicated.

db.getCollection("collection").aggregate([
{
    // only match documents that have this field
    // you can omit this stage if you don't have missing fieldX
    $match: {"user_id": {$nin:[null]}}  
},
{
    $group: { "_id": "$user_id", "doc" : {"$first": "$$ROOT"}}
},
{
    $replaceRoot: { "newRoot": "$doc"}
},
{$out: "collection2"}
],
{allowDiskUse:true}
)

The query above works for one key field. from this solution

for 2 fields, how can I edit it?

Sample collection;

   repo_id    user_id       
0  667006     1060
1  667006     1060 #duplicated ! repo_id and user_id
2  667006     2467194
3  667006     21979

Desired output;

   repo_id    user_id       
0  667006     1060
1  667006     2467194
2  667006     21979

whoami - fakeFaceTrueSoul · Accepted Answer

All you need to change is $group stage, now group on unique pairs of repo_id & user_id.

Try to replace group stage with below :

{
   $group: { _id: {repo_id: '$repo_id',user_id: "$user_id"} , doc: { $first: "$$ROOT" } }
}

Test : mongoplayground

Remove duplications from Mongodb collection on multiple key

Answers (1)

Related Questions