Reputation: 3480
Alright SO users. I am trying to learn and use CouchDB. I have the StackExchange data export loaded as document per row from the XML file, so the documents in couch look basically like this:
//This is a representation of a question:
{
"Id" : "1",
"PostTypeId" : "1",
"Body" : "..."
}
//This is a representation of an answer
{
"Id" : "1234",
"ParentId" : "1",
"PostTypeId" : "2"
"Body" : "..."
}
(Please ignore the fact that the import of these documents basically treated all the attributes as text, I understand that using real numbers, bools, etc. could yield better space/processing efficiency.)
What I'd like to do is to map this into a single aggregate document:
Here's my map:
function(doc) {
if(doc.PostTypeId === "2"){
emit(doc.ParentId, doc);
}
else{
emit(doc.Id, doc);
}
}
And here's the reduce:
function(keys, values, rereduce){
var retval = {question: null, answers : []};
if(rereduce){
for(var i in values){
var current = values[i];
retval.answers = retval.answers.concat(current.answers);
if(retval.question === null && current.question !== null){
retval.question = current.question;
}
}
}
else{
for(var i in values){
var current = values[i];
if(current.PostTypeId === "2"){
retval.push(current);
}
else{
retval.question = current;
}
}
}
return retval;
}
Theoretically, this would yield a document like this:
{
"question" : {...},
"answers" : [answer1, answer2, answer3]
}
But instead I am getting the standard "does not reduce fast enough" error.
Am I using Map-Reduce incorrectly, is there a well-established pattern for how to accomplish this in CouchDb?
(Please also note that I would like a response with the complete documents, where the question is the "parent" and the answers are the "children", not just the Ids.)
Upvotes: 2
Views: 1058
Reputation: 3480
So, the "right" way to accomplish what I'm trying to do above is to add a "list" as part of my design document. (and the end I am trying to achieve appears to be referred to as "collating documents").
At any rate, you can configure your map however you like, and combine it with an a "list" in the same function.
To solve the above question, I eliminated my reduce (only have a map function), and then added a function like the following:
{
"_id": "_design/posts",
"_rev": "11-8103b7f3bd2552a19704710058113b32",
"language": "javascript",
"views": {
"by_question_id": {
"map": "function(doc) {
if(doc.PostTypeId === \"2\"){
emit(doc.ParentId, doc);
}
else{
emit(doc.Id, doc);
}
}"
}
},
"lists": {
"aggregated": "function(head, req){
start({\"headers\": {\"Content-Type\": \"text/json\"}});
var currentRow = null;
var currentObj = null;
var retval = [];
while(currentRow = getRow()){
if(currentObj === null || currentRow.key !== currentObj.key){
currentObj = {key: currentRow.key, question : null, answers : []};
retval.push(currentObj);
}
if(currentRow.value.PostTypeId === \"2\"){
currentObj.answers.push(currentRow.value);
}
else{
currentObj.question = currentRow.value;
}
}
send(toJSON(retval));
}"
}
}
So, after you have some elements loaded up, you can access them like so:
http://localhost:5984/<db>/_design/posts/_list/aggregated/by_question_id?<standard view limiters>
I hope this saves people some time.
Upvotes: 3