daydreamer
daydreamer

Reputation: 91969

MongoDB: find unique documents between date range in a collection

I am not sure how to perform this task

Here is document structure

name:
date_created:
val:

I need to find out unique documents created between January 2011 and October 2011

I know that I can find out the number of document between two date range as

db.collection.find({'date_created': {'$gte': '2011-01-01', '$lt': '2011-10-30'}});  

and I can know the distinct as

db.runCommand({'distinct': 'collection', 'key': 'name'})   

Problem

The problem is that there are duplicate documents inside collection that I need to remove.

How can I answer this question?

find out unique documents created between January 2011 and October 2011 where uniqueness is based on 'name' key

UPDATE

@Sergio ansewer is perfect, after running the query, I got the following result and it can be seen that output number < input number which means duplicates were removed

{
    "result" : "temp_collection",
    "timeMillis" : 1509717,
    "counts" : {
        "input" : 592364,
        "emit" : 592364,
        "output" : 380827
    },
    "ok" : 1
}

Upvotes: 4

Views: 9056

Answers (2)

ESV
ESV

Reputation: 7730

Since the addition of the aggregation framework in MongoDB 2.1, you could also do:

db.collection.aggregate([ 
    {$match: {'date_created': {'$gte': '2011-01-01', '$lt': '2011-10-30'}}}, 
    {$sort: {name: 1}}, 
    {$group: {
        _id: '$name', 
        val: {$first: '$val'}
    }}
])

Upvotes: 2

Sergio Tulentsev
Sergio Tulentsev

Reputation: 230336

Seems that it can be solved with map-reduce. Something like this should help.

var map = function() {
  emit(this.name, this);
}

var reduce = function(key, vals) {
  // vals contains all documents for this key (name). Just pick one.
  return vals[0];
}

db.runCommand({
  mapreduce: 'collection',
  map: map,
  reduce: reduce,
  query: {'date_created': {'$gte': '2011-01-01', '$lt': '2011-10-30'}},
  out: 'temp_collection'
});

After this command returns, you should have your unique documents in temp_collection.

Upvotes: 6

Related Questions