Reputation: 93
I want to use mapreduce to perform the group aggregation. Here is my map function :
function() {
emit(this.TransactionType, { Count: 1 });
}
Here is two reduce functions :
function(key, values) {
var result = {Count: 0};
values.forEach(function(value) {
result.Count += 1;
});
return result;
}
function(key, values) {
var result = {Count: 0};
values.forEach(function(value) {
result.Count += value.Count;
});
return result;
}
and here is the two results :
"_id" : "A", "value" : { "Count" : 13.0 }
"_id" : "B", "value" : { "Count" : 2.0 }
"_id" : "C", "value" : { "Count" : 1.0 }
"_id" : "D", "value" : { "Count" : 209.0 }
"_id" : "E", "value" : { "Count" : 66.0 }
"_id" : "F", "value" : { "Count" : 11.0 }
"_id" : "G", "value" : { "Count" : 17.0 }
"_id" : "H", "value" : { "Count" : 17.0 }
"_id" : "A", "value" : { "Count" : 128.0 }
"_id" : "B", "value" : { "Count" : 115.0 }
"_id" : "C", "value" : { "Count" : 1.0 }
"_id" : "D", "value" : { "Count" : 3645.0 }
"_id" : "E", "value" : { "Count" : 1405.0 }
"_id" : "F", "value" : { "Count" : 256.0 }
"_id" : "G", "value" : { "Count" : 380.0 }
"_id" : "H", "value" : { "Count" : 398.0 }
So why the two results are different?
Thank you very much
Upvotes: 2
Views: 1044
Reputation: 156642
It's helpful to think of the "reduce" function in terms of the "fold" higher-order function. That is to say, your "reduce" function will be applied to a list of values and an accumulated object (the "result" variable in your examples), which is initially specified but will eventually be replaced by the output of successive calls to your function. More over, the list of values to which your function will be applied can be broken up into any number of sub-lists, in any order!
For example, consider how your function would behave using the JavaScript Array "reduce" function, which is an example of the "fold" higher-order function. Your first example will behave improperly because it doesn't use the "Count" property of each element. Successive attempts to use it with Array#reduce will fail similarly:
function badReducer(accum, x) {
accum.Count += 1;
return accum;
}
var sum = {Count:0};
sum = [{Count:1}, {Count:2}, {Count:3}].reduce(badReducer, sum);
sum; // => {Count:3}, d'oh!
sum = [{Count:4}].reduce(badReducer, sum);
sum; // => {Count:5}, d'oh!
However, your second example properly adds the "Count" property and can be applied successively to its own output:
function goodReducer(accum, x) {
accum.Count += x.Count;
return accum;
}
var sum = {Count:0};
sum = [{Count:1}, {Count:2}, {Count:3}].reduce(goodReducer, sum);
sum; // => {Count:6}, woohoo!
sum = [{Count:4}].reduce(goodReducer, sum);
sum; // => {Count:10}, woohoo!
Upvotes: 0
Reputation: 5548
The reduce function must be written such that it may be re-run several times, using its own output as the new input.
The result function outputs data in the form of {_id, [values]}. For your reduce function, the input could be the following:
"A", [{count:1}, {count:2}, {count:3}]
In the first function, count will only incremented by 1 for each document in the values array, and the output will be:
"A", {count:3}
in the second function, the values of count will be added, so the output will be
"A", {count:6}
This is what you are experiencing. For a step-by-step walkthrough of how a Map Reduce operation is run, please see the "Extras" section of the MongoDB Cookbook recipe "Finding Max And Min Values with Versioned Documents" http://cookbook.mongodb.org/patterns/finding_max_and_min/
Good luck and happy Map Reducing!
Upvotes: 1
Reputation: 35475
The first reduce function does this for each value:
result.Count += 1;
The second one does this:
result.Count += value.Count;
So, if your values list is (1,2,3,4,5)
, the first one would add +1 for each item and will return 5 as the output. The second one will add +5 for each item (because value.Count is 5), and so it will output 5+5+5+5+5=25
Upvotes: 1