Andrew Theken
Andrew Theken

Reputation: 3480

What's the best way to build an aggregate document in couchdb?

Alright SO users. I am trying to learn and use CouchDB. I have the StackExchange data export loaded as document per row from the XML file, so the documents in couch look basically like this:

//This is a representation of a question:
{
 "Id" : "1",
 "PostTypeId" : "1",
 "Body" : "..."
}

//This is a representation of an answer
{
 "Id" : "1234",
 "ParentId" : "1",
 "PostTypeId" : "2"
 "Body" : "..."
}

(Please ignore the fact that the import of these documents basically treated all the attributes as text, I understand that using real numbers, bools, etc. could yield better space/processing efficiency.)

What I'd like to do is to map this into a single aggregate document:

Here's my map:

function(doc) {
    if(doc.PostTypeId === "2"){
      emit(doc.ParentId, doc);
    }
    else{
        emit(doc.Id, doc);
    }
}

And here's the reduce:

function(keys, values, rereduce){
    var retval = {question: null, answers : []};

    if(rereduce){
        for(var i in values){
            var current = values[i];
            retval.answers = retval.answers.concat(current.answers);
            if(retval.question === null && current.question !== null){
                retval.question = current.question;
            }
        }
    }
    else{
        for(var i in values){
            var current = values[i];            

            if(current.PostTypeId === "2"){
                retval.push(current);
            }
            else{
                retval.question = current;
            }
        }
    }
    return retval;
}

Theoretically, this would yield a document like this:

{
    "question" : {...},
    "answers" : [answer1, answer2, answer3]
}

But instead I am getting the standard "does not reduce fast enough" error.

Am I using Map-Reduce incorrectly, is there a well-established pattern for how to accomplish this in CouchDb?

(Please also note that I would like a response with the complete documents, where the question is the "parent" and the answers are the "children", not just the Ids.)

Upvotes: 2

Views: 1058

Answers (1)

Andrew Theken
Andrew Theken

Reputation: 3480

So, the "right" way to accomplish what I'm trying to do above is to add a "list" as part of my design document. (and the end I am trying to achieve appears to be referred to as "collating documents").

At any rate, you can configure your map however you like, and combine it with an a "list" in the same function.

To solve the above question, I eliminated my reduce (only have a map function), and then added a function like the following:

{
   "_id": "_design/posts",
   "_rev": "11-8103b7f3bd2552a19704710058113b32",
   "language": "javascript",
   "views": {
       "by_question_id": {
           "map": "function(doc) {
                if(doc.PostTypeId === \"2\"){
                    emit(doc.ParentId, doc);
                }
                else{
                    emit(doc.Id, doc);
                }
            }"
       }
   },
   "lists": {
       "aggregated": "function(head, req){ 
                        start({\"headers\": {\"Content-Type\": \"text/json\"}});
                        var currentRow = null;
                        var currentObj = null; 
                        var retval = []; 
                        while(currentRow = getRow()){
                            if(currentObj === null || currentRow.key !== currentObj.key){
                                currentObj = {key: currentRow.key, question : null, answers : []};
                                retval.push(currentObj);
                            } 
                            if(currentRow.value.PostTypeId === \"2\"){
                                currentObj.answers.push(currentRow.value);
                            } 
                            else{
                                currentObj.question = currentRow.value;
                            }
                        }
                        send(toJSON(retval));
                    }"
   }
}

So, after you have some elements loaded up, you can access them like so:

http://localhost:5984/<db>/_design/posts/_list/aggregated/by_question_id?<standard view limiters>

I hope this saves people some time.

Upvotes: 3

Related Questions