Reputation: 4649
I have an array and it contains duplicate values in BOTH the ID's, is there a way to remove one of the duplicate array item?
userName: "abc",
_id: 10239201141,
rounds:
[{
"roundId": "foo",
"money": "123
},// Keep one of these
{// Keep one of these
"roundId": "foo",
"money": "123
},
{
"roundId": "foo",
"money": "321 // Not a duplicate.
}]
I'd like to remove one of the first two, and keep the third because the id and money are not duplicated in the array.
Thank you in advance!
Edit I found:
db.users.ensureIndex({'rounds.roundId':1, 'rounds.money':1}, {unique:true, dropDups:true})
This doesn't help me. Can someone help me? I spent hours trying to figure this out.
The thing is, I ran my node.js website on two machines so it was pushing the same data twice. Knowing this, the duplicate data should be 1 index away. I made a simple for loop that can detect if there is duplicate data in my situation, how could I implement this with mongodb so it removes an array object AT that array index?
for (var i in data){
var tempRounds = data[i]['rounds'];
for (var ii in data[i]['rounds']){
var currentArrayItem = data[i]['rounds'][ii - 1];
if (tempRounds[ii - 1]) {
if (currentArrayItem.roundId == tempRounds[ii - 1].roundId && currentArrayItem.money == tempRounds[ii - 1].money) {
console.log("Found a match");
}
}
}
}
Upvotes: 0
Views: 1805
Reputation: 11671
Use an aggregation framework to compute a deduplicated version of each document:
db.test.aggregate([
{ "$unwind" : "$stats" },
{ "$group" : { "_id" : "$_id", "stats" : { "$addToSet" : "$stats" } } }, // use $first to add in other document fields here
{ "$out" : "some_other_collection_name" }
])
Use $out
to put the results in another collection, since aggregation cannot update documents. You can use db.collection.renameCollection
with dropTarget
to replace the old collection with the new deduplicated one. Be sure you're doing the right thing before you scrap the old data, though.
Warnings:
1: This does not preserve the order of elements in the stats
array. If you need to preserve order, you will have retrieve each document from the database, manually deduplicate the array client-side, then update the document in the database.
2: The following two objects won't be considered duplicates of each other:
{ "id" : "foo", "price" : 123 }
{ "price" : 123, "id" : foo" }
If you think you have mixed key orders, use a $project
to enforce a key order between the $unwind
stage and the $group
stage:
{ "$project" : { "stats" : { "id_" : "$stats.id", "price_" : "$stats.price" } } }
Make sure to change id -> id_
and price -> price_
in the rest of the pipeline and rename them back to id
and price
at the end, or rename them in another $project
after the swap. I discovered that, if you do not give different names to the fields in the project, it doesn't reorder them, even though key order is meaningful in an object in MongoDB:
> db.test.drop()
> db.test.insert({ "a" : { "x" : 1, "y" : 2 } })
> db.test.aggregate([
{ "$project" : { "_id" : 0, "a" : { "y" : "$a.y", "x" : "$a.x" } } }
])
{ "a" : { "x" : 1, "y" : 2 } }
> db.test.aggregate([
{ "$project" : { "_id" : 0, "a" : { "y_" : "$a.y", "x_" : "$a.x" } } }
])
{ "a" : { "y_" : 2, "x_" : 1 } }
Since the key order is meaningful, I'd consider this a bug, but it's easy to work around.
Upvotes: 1