Reputation: 197
I've collected about 10 mio documents spaning a few weeks in my mongodb database, and I want to be able to calculate some simple statistics and output them. The statistics I'm trying to get is the average of the rating on each document within a timespan, in one hour intervals.
To give an idea of what I'm trying to do, follow this sudo code:
var dateTimeStart;
var dateTimeEnd;
var distinctHoursBetweenDateTimes = getHours(dateTimeStart, dateTimeEnd);
var totalResult=[];
foreach( distinctHour in distinctHoursBetweenDateTimes )
tmpResult = mapreduce_getAverageRating( distinctHour, distinctHour +1 )
totalResult[distinctHour] = tmpResult;
return totalResult;
My document structure is something like: {_id, rating, topic, created_at}
Created_at is the date I'm gathering my statistics based on (time of insertion and time created are not always the same)
I've created an index on the created_at field.
The following is my mapreduce:
map = function (){
emit( this.Topic , { 'total' : this.Rating , num : 1 } );
};
reduce = function (key, values){
var n = {'total' : 0, num : 0};
for ( var i=0; i<values.length; i++ ){
n.total += values[i].total;
n.num += values[i].num;
}
return n;
};
finalize = function(key, res){
res.avg = res.total / res.num;
return res;
};
I'm pretty sure this can be done more effectively - possibly by letting mongo do more work, instead of running several map-reduce statements in a row.
At this point each map-reduce takes about 20-25 seconds so counting statistics for all the hours over a few days suddenly takes up a very long time.
My impression is that mongo should be suited for this kind of work - hence I must obviously be doing something wrong.
Thanks for your help!
Upvotes: 1
Views: 184
Reputation: 69703
And I assume the time is part of the documents you are MapReducing?
When you run the MapReduce over all documents, determine the hour in the map function and add it to the key you emit, you could do all this in a single MapReduce.
Upvotes: 1