Reputation: 1467
I have a mongoDB collection that is like this (below). As you can see it has a number of duplicate records, with maybe a few attributes that differ. Now in my collection there are plus 18000 results, I need to remove all of the duplicates that are in there. I doesn't matter which one I keep, I just need no dupes. Can any one help or point me in the right direction?
{
commonName: "Lionel Messi",
firstName: "Lionel",
lastName: "Messi",
rating: 97
},{
commonName: "Lionel Messi",
firstName: "Lionel",
lastName: "Messi",
rating: 96
},{
commonName: "Lionel Messi",
firstName: "Lionel",
lastName: "Messi",
rating: 92
},{
commonName: "Jamie Vardy",
firstName: "Jamie",
lastName: "Vardy",
rating: 82
},{
commonName: "Jamie Vardy",
firstName: "Jamie",
lastName: "Vardy",
rating: 86
}
Upvotes: 0
Views: 86
Reputation: 188
You could clean your data by adding a unique index. Depending on your mongoDB version you have two ways.
If your mongoDB version is 2.6 or older then you can run this command:
db.players.ensureIndex({'commonName' : 1, 'firstName' :1 }, {unique : true, dropDups : true})
If your version is newer then you could do something like this:
db.players.aggregate([
{ "$group": {
"_id": { "commonName": "$commonName", "firstName": "$firstName"},
"dups": { "$push": "$_id" },
"count": { "$sum": 1 }
}},
{ "$match": { "count": { "$gt": 1 } }}
]).forEach(function(doc) {
doc.dups.shift();
db.events.remove({ "_id": {"$in": doc.dups }});
});
db.players.createIndex({"commonName":1 , "firstName": 1},
{unique:true})
Warning: You should first try this on some test data, just to be sure you are not removing important data that you want.
Upvotes: 1
Reputation: 2654
You can use aggregate
to clean your data, and then use $out
to write a collection, or even overwrite your current collection:
db.players.aggregate([
{
$group : {
_id : { commonName: "$commonName" },
commonName: {$first: "$commonName"},
firstName: {$first: "$firstName"},
lastName: {$first: "$lastName"},
rating: {$first: "$rating"},
}
},
{ $project : { _id:0, commonName:1, firstName:1, lastName:1, rating:1 } },
{ $out : "players" }
])
Note: If you want to write a new collection use { $out : "newCollection" }
Upvotes: 1
Reputation: 985
Create temp collection with unique index of all the four fields, Then copy data from original collection to the temp collection, now temp collection should contain only unique records. after this you can clear original collection records and move records from temp to original collection
Upvotes: 1