andy
andy

Reputation: 1882

CouchDB, MapReduce: query a time slice

For a monitoring an application with CouchDB I need to sum up a field of my data (for example the time needed to execute a method that has been logged).

That's no problem for me with map-reduce, but I need to sum up only the data recorded in a special time slice.

Example records:

{_id: 1, methodID:1, recorded: 100, timeneeded: 10}, 
{_id: 2, methodID:1, recorded: 200, timeneeded: 11}, 
{_id: 3, methodID:2, recorded: 200, timeneeded: 2}, 
{_id: 4, methodID:1, recorded: 300, timeneeded: 6}, 
{_id: 5, methodID:2, recorded: 310, timeneeded: 3}, 
{_id: 6, methodID:1, recorded: 400, timeneeded: 9}

Now I would like to get just the sum of timeneeded of all records that have been recorded in the range of 200 to 350 and grouped by methodID. (That would be 17 for methodID:1 and 5 for methodID:2.)

How can I do that?


I now tried it with a list function that's using WickedGrey's idea. See my functions here:

map function:

function(doc) {  
  emit([ doc.recorded], {methodID:doc.methodID, timeneeded:doc.timeneeded}); 
}

list function:

"function(head, req) {  
  var combined_values = {};
  var row;   
  while (row = getRow()) {  

      if( row.values.methodID in combined_values)     { 
        combined_values[ row.values.methodID] +=row.values.timeneeded; 
      }        
      else {  
        combined_values[ row.values.methodID] = row.values.timeneeded;    
      } 

  } 

  for(var methodID in combined_values){ 
    send( toJSON({method: methodID, timeneeded:combined_values[methodID]}) );
  }   
}"

Now I have to problems: 1. I always get the results as a file and my firefox asks me if I want to download it, instead of viewing it in the browser like when I query a classic view. 2. As I understand the thing, the results are now calculated on the fly, in the list function. I expect this to be not really fast with hundrets of millions of records... Any ideas how to get it faster?

Thank you for your help! andy

Upvotes: 0

Views: 741

Answers (2)

MetaThis
MetaThis

Reputation: 101

function map(doc) {
  if(doc.methodID && doc.recorded && doc.timeneeded) {
    emit([doc.methodID,doc.recorded], doc.timeneeded);
  }
}

//reduce
_sum

Upvotes: 1

Eli Stevens
Eli Stevens

Reputation: 1447

You can't use a map key to filter by one set of criteria, but group by another in CouchDB. However, you can filter the keys by time range, and group with a reduce function. Try something like this:

function map(doc) {
    emit(doc.recorded, {doc.methodID: doc.timeneeded});
}

function reduce(key, values, rereduce) {
    var combined_values = {};
    for (var i in values) {
        var totals = values[i];
        for (var methodID in totals) {
            if (methodID in combined_values) {
                combined_values[methodID] += totals[methodID];
            }
            else {
                combined_values[methodID] = totals[methodID];
            }
        }
    }
    return combined_values;
}

That should allow you to specify a start/end key, and with group_level=0 should get you a value containing the dictionary that you're looking for.

Edit: Also, this thread might be of interest:

http://couchdb-development.1959287.n2.nabble.com/reduce-limit-error-td2789734.html

It discusses an option to turn off the reduce must shrink message, and further down the list provides other ways of achieving the same goal: using a list function. That might be a better approach that what I've outlined here. :(

Upvotes: 1

Related Questions