sarwar
sarwar

Reputation: 2845

CouchDB Reduce- Error only in descending order

I'm having some issue getting what seems like a fairly straight-forward problem solved in CouchDB 1.01. My data is basically a log of drugs despensed from 30 odd clinics with some basic data about the drug and a timestamp. I'm passing a fairly ordinary group of objects as a map result into reduce in the pseudo-form:

Key:ClinicName, Value:{"vaccine":DrugType, "stamp":TimeStamp}

The aim of my reduce function is to allow a quick reference of the amounts of each type of drug dispensed.

Map

function(doc) {
   if(doc.type=="dose"){
    emit(doc.clinicName, {"vaccine":doc.vaccine,"stamp":doc.timestamp});
   }
}

Reduce

function(keys, values){
  var indexes = Object.keys(values);
  var vCount = new Object;
  for (var c in indexes){
    var val = values[c]
    var vname = val.vaccine
    if(vCount.hasOwnProperty(vname)){
      vCount[vname] = vCount[vname] + 1;
    }
    else{
      vCount[vname] = 1;
    }  
  }
  return vCount;
}

This works perfectly when I have with ?key= a specific ClinicName as long as descending=false and group=true. As soon as set descending to true, my results are cut off about halfway through.

Two questions:

  1. Why should the result order matter for the reduce function? With reduce off, the results are the same forwards and backwards.
  2. I'd read somewhere that if your reduction doesn't provide a single scalar, you're probably doing it wrong. If strange behavior aside this is a poor approach, what's the right way to present this sort of data from a log-style data source?

Upvotes: 1

Views: 313

Answers (3)

sarwar
sarwar

Reputation: 2845

Jason's answer is excellent and correct, but for anyone stumbling upon this, my underlying issue was a lack of understanding of how reduce works.

Most importantly, the output of the reduce function must itself be reducable as couchdb performs reduction in parallel. If you have 1000 rows matching a key, couch may take 10 sets of 100 and apply the function to each. It will then rereduce the 10 outputs of the previous reductions to arrive at the solution for the keyset.

It's probably best that you read the docs...

Read the Reduce/Rereduce Sections of the CouchDB docs

Upvotes: 0

JasonSmith
JasonSmith

Reputation: 73752

If you want to know the amount (count) of vaccines per clinic, then you need that in the key.

// pseudo-form
Key:[ClinicName, DrugType], Value:{"stamp":TimeStamp}

Next, your reduce "function" can simply be the string "_count".

With this, you can set ?group_level=2 and get one row per clinic per vaccine, with a sum of all doses dispensed. This may not be relevant to you, but you get for free the ability to count doses (of all drugs) per clinic with ?group_level=1.

To get the total count of vaccines across all clinics, that view must be keyed on the drug only.

// pseudo-form
Key:DrugType, Value:{"stamp":TimeStamp}

The primary point is that reduce must always work on rows that are adjacent, next to each other in the map output. Then you can use ?group_level or startkey/endkey to get meaningful results.

Upvotes: 1

JasonSmith
JasonSmith

Reputation: 73752

The answer to question 2 is easier.

The "single scalar" rule of thumb is fine for getting started, but I have seen many advanced applications use objects exactly like you do.

For example, see this recent answer about summing up related values in an object: https://stackoverflow.com/a/10082894/2938

Upvotes: 1

Related Questions