Reputation: 761
I Have the following Collection :
/* 0 */
{
"clientID" : ObjectId("51b9c10d91d1a3a52b0000b8"),
"_id" : ObjectId("532b4f1cb3d2eacb1300002b"),
"answers" : [],
"questions" : []
}
/* 1 */
{
"clientID" : ObjectId("51b9c10d91d1a3a52b0000b8"),
"_id" : ObjectId("532b6b9eb3d2eacb1300002c"),
"answers" : [
"1",
"8"
],
"questions" : [
"1",
"2",
"3"
]
}
/* 2 */
{
"clientID" : ObjectId("51b9c10d91d1a3a52b0000b8"),
"_id" : ObjectId("532b6baeb3d2eacb1300002d"),
"answers" : [
"1",
"8"
],
"questions" : [
"1",
"2",
"3"
]
}
/* 3 */
{
"clientID" : ObjectId("5335f9d864e2b1290c00012e"),
"_id" : ObjectId("533b828146ca43634000002d"),
"answers" : [
"ORANGE"
],
"questions" : [
"Color"
]
}
/* 4 */
{
"clientID" : ObjectId("5335f9d864e2b1290c00012e"),
"_id" : ObjectId("5351be327b539a4d1a00002b"),
"answers" : [
"ORANGE"
],
"questions" : [
"Color"
]
}
/* 5 */
{
"clientID" : ObjectId("5335f9d864e2b1290c00012e"),
"_id" : ObjectId("5351be5ec89d717d1a00002b"),
"answers" : [
"ORANGE"
],
"questions" : [
"Color"
]
}
I am running the following code in order to find how many times the (questions,answers) combination appears in the collection:
o.map= function(){
emit({"questions" : this.questions, "answers" :this.answers },this.clientID)
};
o.reduce = function(answers, collection){
return collection.length;
};
logSearchDB.mapReduce(o,function (err, results) {
results.sort(function(a, b){return b.value-a.value});
for (var i = 0; i < results.length; i++) {
console.log(JSON.stringify(results[i]))
};
})
The output is:
{"_id":{"questions":[],"answers":[]},"value":"51b9c10d91d1a3a52b0000b8"}
{"_id":{"questions":["Color"],"answers":["ORANGE"]},"value":3}
{"_id":{"questions":["1","2","3"],"answers":["1","8"]},"value":2}
I expected that the first row will have "value" : 1
I guess the 'reduce' function got a 'collection' object : "51b9c10d91d1a3a52b0000b8", instead of getting an array : ["51b9c10d91d1a3a52b0000b8"].
Why the map reduce doesn't collect everything into an array?
Upvotes: 0
Views: 1605
Reputation: 151190
The reason why you have just a plain value in that first row is because there was only one occurrence of your key value. This is generally how mapReduce works, at least in the way it was specified in the original papers.
So the reduce function is not actually called when there only is a single key. To work around this you use the finalize
function in your map reduce:
var finalize = function(key,value) {
if ( typeof(value) != "number" )
value = 1;
return value;
};
db.collection.mapReduce(
mapper,
reducer,
{
"finalize": finalize,
"out": { "inline": 1 }
}
);
That runs over all of the output and sees that when the value is seen to be not a nunber, being the clientID
you are emitting, then the value is set at 1
because that is how hany are in the grouping.
Really your query is better suited to the aggregation framework than mapReduce. The aggregation framework is a native code implementation as opposed to using a JavaScript interpreter. It runs much faster than mapReduce:
db.collection.aggregate([
{ "$group": {
"_id": {
"questions": "$questions",
"answers": "$answers"
},
"count": { "$sum": 1 }
}}
])
So it is the better option to use. It was a later introduction to MongoDB so people still tend to think in terms of mapReduce or otherwise there is legacy code from earlier versions of MongoDB. But this has been around for quite a while now.
Also see the operator reference for the aggregation framework.
Upvotes: 1