user3169506
user3169506

Reputation: 93

How can I aggregate documents by time interval in MongoDB?

I need to aggregate my collection based on a certain time interval.

As you may think, I don´t need to count e.g. per hour our day.

I need to aggregate based on a 30 minutes interval (or any other). Lets say, the first document was created at 3:45PM. Then there are 5 more documents, created between 3:45PM and 4:15PM. So in this time interval, I have 6 documents. So the first document of the MapReduce result is a document with the count of 6.

Let´s say, the next document is created ad 4:35PM and three more at 4:40PM.

So the next document of the MapReduce result is a document with the count of 4.

And so on...

Currently my map function looks like this:

var map = function() {                                    
        var key = {name: this.name, minute: this.timestamp.getMinutes()};
        emit(key, {count: 1})
};

So nothing special. Currently I group by the minute, which is not what I want at the end. Here, instead of minute, I need to be able to check the time-interval described above.

And my reduce function:

var reduce = function(key, values)
{
   var sum = 0;
   values.forEach(function(value)
   {
       sum += value['count'];
   });
   return {count: sum};
};

The output of this is like that:

{
0: "{ "_id" : { "name" : "A" , "minute" : 11.0} , "value" : { "count" : 1.0}}",
1: "{ "_id" : { "name" : "B" , "minute" : 41.0} , "value" : { "count" : 6.0}}",
2: "{ "_id" : { "name" : "B" , "minute" : 42.0} , "value" : { "count" : 3.0}}",
3: "{ "_id" : { "name" : "C" , "minute" : 41.0} , "value" : { "count" : 2.0}}",
4: "{ "_id" : { "name" : "C" , "minute" : 42.0} , "value" : { "count" : 2.0}}",
5: "{ "_id" : { "name" : "D" , "minute" : 11.0} , "value" : { "count" : 1.0}}",
6: "{ "_id" : { "name" : "E" , "minute" : 16.0} , "value" : { "count" : 1.0}}"
}

So it counts / aggregates documents per minute, but NOT by my custom time interval.

Any ideas about this?

Upvotes: 2

Views: 4639

Answers (1)

Mzzl
Mzzl

Reputation: 4126

Edit: My example using map reduce didn't work, but I think this does roughly what you want to do. I use project to define a variable time to contain the minutes from your timestamp rounded to 5 minute intervals. This would be easy with an integer divide, but I don't think the mongodb query language supports that at this time, so instead I subtract minutes mod 5 from the minutes to get a number that changes every 5 minutes. Then a group by the name and this time counter should do the trick.

query = [
    {   
        "$project": {
            "_id":"$_id",
            "name":"$name",
            "time": {
                "$subtract": [
                    {"$minute":"$timestamp"},
                    {"$mod": [{"$minute":"$timestamp"}, 5]}
                ]
            }
        }
    },
    {   
        "$group": {"_id": {"name": "$name", "time": "$time"}, "count":{"$sum":1}}
    }
]
db.foo.aggregate(query)

Upvotes: 1

Related Questions