Reputation: 93
I need to aggregate my collection based on a certain time interval.
As you may think, I don´t need to count e.g. per hour our day.
I need to aggregate based on a 30 minutes interval (or any other). Lets say, the first document was created at 3:45PM. Then there are 5 more documents, created between 3:45PM and 4:15PM. So in this time interval, I have 6 documents. So the first document of the MapReduce result is a document with the count of 6.
Let´s say, the next document is created ad 4:35PM and three more at 4:40PM.
So the next document of the MapReduce result is a document with the count of 4.
And so on...
Currently my map function looks like this:
var map = function() {
var key = {name: this.name, minute: this.timestamp.getMinutes()};
emit(key, {count: 1})
};
So nothing special. Currently I group by the minute, which is not what I want at the end. Here, instead of minute, I need to be able to check the time-interval described above.
And my reduce function:
var reduce = function(key, values)
{
var sum = 0;
values.forEach(function(value)
{
sum += value['count'];
});
return {count: sum};
};
The output of this is like that:
{
0: "{ "_id" : { "name" : "A" , "minute" : 11.0} , "value" : { "count" : 1.0}}",
1: "{ "_id" : { "name" : "B" , "minute" : 41.0} , "value" : { "count" : 6.0}}",
2: "{ "_id" : { "name" : "B" , "minute" : 42.0} , "value" : { "count" : 3.0}}",
3: "{ "_id" : { "name" : "C" , "minute" : 41.0} , "value" : { "count" : 2.0}}",
4: "{ "_id" : { "name" : "C" , "minute" : 42.0} , "value" : { "count" : 2.0}}",
5: "{ "_id" : { "name" : "D" , "minute" : 11.0} , "value" : { "count" : 1.0}}",
6: "{ "_id" : { "name" : "E" , "minute" : 16.0} , "value" : { "count" : 1.0}}"
}
So it counts / aggregates documents per minute, but NOT by my custom time interval.
Any ideas about this?
Upvotes: 2
Views: 4639
Reputation: 4126
Edit: My example using map reduce didn't work, but I think this does roughly what you want to do.
I use project to define a variable time
to contain the minutes from your timestamp rounded to 5 minute intervals. This would be easy with an integer divide, but I don't think the mongodb query language supports that at this time, so instead I subtract minutes mod 5
from the minutes to get a number that changes every 5 minutes. Then a group by the name and this time counter should do the trick.
query = [
{
"$project": {
"_id":"$_id",
"name":"$name",
"time": {
"$subtract": [
{"$minute":"$timestamp"},
{"$mod": [{"$minute":"$timestamp"}, 5]}
]
}
}
},
{
"$group": {"_id": {"name": "$name", "time": "$time"}, "count":{"$sum":1}}
}
]
db.foo.aggregate(query)
Upvotes: 1