Reputation: 3659
I'm trying to use map-reduce to understand when this can be helpful.
So I have a collection named "actions" with 100k docs like this:
{
"profile_id":1111,
"action_id":2222
}
Now I'm trying to do map-reduce examples. I'm trying to get a list of "all users and total actions each one has". Is this possible? My code:
db.fbooklikes.mapReduce(
function(){
emit(this.profile_id, this.action_id);
},
function(keyProfile, valueAction){
return Array.sum(valueAction);
},
{
out:"example"
}
)
.. This is not working. The result is:
"counts" : {
"input" : 100000,
"emit" : 100000,
"reduce" : 1146,
"output" : 13
},
"ok" : 1,
"_o" : {
"result" : "map_reduce_example",
"timeMillis" : 2539,
"counts" : {
"input" : 100000,
"emit" : 100000,
"reduce" : 1146,
"output" : 13
},
"ok" : 1
},
What I'm trying to do is something possible with map-reduce?
Upvotes: 1
Views: 177
Reputation: 125488
You don't want to sum the action ids, you want to count them. So you want something like the following
var map = function () {
emit(this.profile_id, { action_ids : [this.action_id], count : 1 });
}
var reduce = function(profile_id, values) {
var value = { action_ids: [], count: 0 };
for (var i = 0; i < values.length; i++) {
value.count += values[i].count;
value.action_ids.push.apply(value.action_ids, values[i].action_ids);
}
return value;
}
db.fbooklikes.mapReduce(map, reduce, { out:"example" });
This will give you an array of action ids and a count for each profile id. The count could be obtained by accessing the length
of the action_ids
array, but I thought I would keep it separate to make the example clearer.
Upvotes: 2
Reputation: 151072
Well yes you can use it, but the more refined response is that there are likely better tools for doing what you want.
MapReduce is handy for some tasks, but usually best suited when something else does not apply. The inclusion of mapReduce in MongoDB pre-dates the introduction of the aggregation framework, which is generally what you should be using when you can:
db.fbooklikes.aggregate([
{ "$group": {
"_id": "$profile_id",
"count": { "$sum": 1 }
}}
])
Which will simply return the counts for the all documents in the collection grouped by each value of "profile_id".
MapReduce requires JavaScript evaluation and therefore runs much slower than the native code functions implemented by the aggregation framework. Sometimes you have to use it, but in simple cases it is best not to, and there are some quirks that you need to understand:
db.fbooklikes.mapReduce(
function(){
emit(this.profile_id, 1);
},
function(key,values){
return Array.sum(values);
},
{
out: { "inline": 1 }
}
)
The biggest thing people miss with mapReduce is the fact that the reducer is almost never called just once per emitted key. In fact it will process output in "chunks", thus "reducing" down part of that output and placing it back to be "reduced" again against other output until there is only a single value for that key.
For this reason it is important to emit the same type of data from the reduce function as is sent from the "map" function. It's a sticky point that can lead to weird results when you don't understand that part of the function. It is in fact the underlying way that mapReduce can deal with large values of results for a single key value and reduce them.
But generally speaking, you should be using the aggregation framework where possible, and where a problem requires some special calculations that would not be possible there, or otherwise has some complex document traversal where you need to inspect with JavaScript, then that is where you use mapReduce.
Upvotes: 3