MongoDB: Conditionally drop duplicates

Question

I have a documents collection like so:

{
    "word": "foo",
    "likes": 10,
    "dislikes": 1,
},
{
    "word": "foo",
    "likes": 5,
    "dislikes": 9,
},

The trouble is, my collection is riddled with similar documents (sharing the same word, but different data). I would like to remove these similar, almost duplicate entries.

Now, an easy way would be to use unique index:

db.entries.ensureIndex({'word' : 1}, {unique : true, dropDups : true})

But I feel like I can do better. Maybe I can use likes/dislikes data to calculate the ratio and keep only the best entries, while removing the rest.

I was wondering if this is possible to do with MapReduce and Mongo CLI Javascript magic, or should I solve this problem programatically using MongoDB primitives?

Edit: This cleanup is a 1-time event, and performance doesn't matter.

MongoDB: Conditionally drop duplicates

Answers (1)

Related Questions