ECMAScript
ECMAScript

Reputation: 4649

Remove duplicate array objects mongodb

I have an array and it contains duplicate values in BOTH the ID's, is there a way to remove one of the duplicate array item?

userName: "abc",
_id: 10239201141,
rounds:
   [{
      "roundId": "foo",
      "money": "123
   },// Keep one of these
   {// Keep one of these
      "roundId": "foo",
      "money": "123
   },
   {
      "roundId": "foo",
      "money": "321 // Not a duplicate.
   }]

I'd like to remove one of the first two, and keep the third because the id and money are not duplicated in the array.

Thank you in advance!

Edit I found:

db.users.ensureIndex({'rounds.roundId':1, 'rounds.money':1}, {unique:true, dropDups:true})

This doesn't help me. Can someone help me? I spent hours trying to figure this out.

The thing is, I ran my node.js website on two machines so it was pushing the same data twice. Knowing this, the duplicate data should be 1 index away. I made a simple for loop that can detect if there is duplicate data in my situation, how could I implement this with mongodb so it removes an array object AT that array index?

for (var i in data){
    var tempRounds = data[i]['rounds'];
    for (var ii in data[i]['rounds']){
        var currentArrayItem = data[i]['rounds'][ii - 1];
        if (tempRounds[ii - 1]) {
            if (currentArrayItem.roundId == tempRounds[ii - 1].roundId && currentArrayItem.money == tempRounds[ii - 1].money) {
                console.log("Found a match");
            }
        }
    }
}

Upvotes: 0

Views: 1805

Answers (1)

wdberkeley
wdberkeley

Reputation: 11671

Use an aggregation framework to compute a deduplicated version of each document:

db.test.aggregate([
    { "$unwind" : "$stats" },
    { "$group" : { "_id" : "$_id", "stats" : { "$addToSet" : "$stats" } } }, // use $first to add in other document fields here
    { "$out" : "some_other_collection_name" }
])

Use $out to put the results in another collection, since aggregation cannot update documents. You can use db.collection.renameCollection with dropTarget to replace the old collection with the new deduplicated one. Be sure you're doing the right thing before you scrap the old data, though.

Warnings:

1: This does not preserve the order of elements in the stats array. If you need to preserve order, you will have retrieve each document from the database, manually deduplicate the array client-side, then update the document in the database.

2: The following two objects won't be considered duplicates of each other:

{ "id" : "foo", "price" : 123 }
{ "price" : 123, "id" : foo" }

If you think you have mixed key orders, use a $project to enforce a key order between the $unwind stage and the $group stage:

{ "$project" : { "stats" : { "id_" : "$stats.id", "price_" : "$stats.price" } } }

Make sure to change id -> id_ and price -> price_ in the rest of the pipeline and rename them back to id and price at the end, or rename them in another $project after the swap. I discovered that, if you do not give different names to the fields in the project, it doesn't reorder them, even though key order is meaningful in an object in MongoDB:

> db.test.drop()
> db.test.insert({ "a" : { "x" : 1, "y" : 2 } })
> db.test.aggregate([
    { "$project" : { "_id" : 0, "a" : { "y" : "$a.y", "x" : "$a.x" } } }
])
{ "a" : { "x" : 1, "y" : 2 } }
> db.test.aggregate([
    { "$project" : { "_id" : 0, "a" : { "y_" : "$a.y", "x_" : "$a.x" } } }
])
{ "a" : { "y_" : 2, "x_" : 1 } }

Since the key order is meaningful, I'd consider this a bug, but it's easy to work around.

Upvotes: 1

Related Questions