Reputation: 16195
I am experimenting with map-reduce in mongo and have run into a a numerical problem that has me completely stumped. Given the following map and reduce functions:
var map = function(){
key = "awesome";
emit(key, {count: 1})
}
var reduce = function(key, values){
var result = {count: 0};
values.forEach(function(value) {
result.count += value.count;
});
result.countBy2 = result.count/2
// result.count = result.count/2
return result
}
gives the logical
"results" : [
{
"_id" : "awesome",
"value" : {
"value" : {
"count" : 7696.0000000000000000,
"countBy2" : 3848.0000000000000000
}
},
Uncommenting line in the top code snippet gives very curious output
"results" : [
{
"_id" : "awesome",
"value" : {
"count" : 98.0000000000000000,
"countBy2" : 98.0000000000000000
}
},
Why?
Reversal of the commented lines to keep the object format of the map and reduce commands identical (associative?). .
// result.countBy2 = result.count/2
result.count = result.count/2
Still gives unexpected output
"results" : [
{
"_id" : "awesome",
"value" : {
"count" : 98.0000000000000000
}
}
What am I missing?
Upvotes: 2
Views: 69
Reputation: 311865
When your reduce
function includes the line that divides count
by 2, it violates the idempotent requirement you must adhere to for reduce functions as described in the docs:
the reduce function must be idempotent. Ensure that the following statement is true:
reduce( key, [ reduce(key, valuesArray) ] ) == reduce( key, valuesArray )
That basically says that you need to be able to feed the output of one reduce
call back as input into another call and have the result stay the same.
If you want to perform final processing on the output of your map-reduce, then you can include a finalize function in the options that will only be called once. Depending on what you're ultimately trying to do, that's likely where you should be dividing your count by 2:
finalize: function(key, reducedValue) { return { count: reducedValue.count/2 }; }
Upvotes: 2