Reputation: 38399
I have a CouchDB database with documents representing 100,000 events. Each event has, among other things, a time that it took place (stored as an array of [Year,month,day,hour,minute,second]) and a score. I would like to make a graph of average scores over time. In order to do this, I created a view with a map that emitted keys bucketed into intervals, and a reduce function that averaged keys in a bucket.
This works quite well. When queried for the total average, CouchDB returns a result almost instantly. When I bucket by day and get a hundred or so results, my CouchDB database takes a couple hundred ms to produce a result. Using 1000+ buckets, a query takes several seconds to return. While this query is running, my CPU jumps to 100% and my disk is fairly quiet.
I am a bit puzzled by this slowdown. Since reducing over everything seems to be instant, I have concluded that the overhead may be in producing a JSON document with 1000+ entries. Is CouchDB not capable of returning 1000 result rows in a speedy fashion?
I'm a bit of a CouchDB newbie, so it's completely possible my map or reduce function is awful, or maybe there's a flag somewhere in the configuration that allows CouchDB to make use of more memory. Or maybe it's the case that CouchDB is simply strongly optimized for aggregate querying over returning many results.
Advice?
Upvotes: 1
Views: 351
Reputation: 4631
If when you say 'average' you mean 'mean average', there's no way to get that directly with a map/reduce pair as it's recursive (each level of recursion will introduce rounding errors).
A better solution is to collect the sum and count of items, and you can then trivially derive the mean average on the client.
It sounds like your reduce function is the cause of the slowdown, but I can't tell until you show it.
Assuming your map function is this;
function(doc) {
emit([doc.year, doc.month, doc.day, doc.hour, doc.minute, doc.second], doc.score);
}
and your reduce function is;
_stats
then your results should be very fast (and scalable). The built-in _stats function will return results like this;
{"sum":2,"count":2,"min":1,"max":1,"sumsqr":2}
So, for any call to your view, you can get the sum of the scores and the number of the scores, the mean average is easily derived.
Upvotes: 2