Reputation: 1457
Given this document format
{
"_id" : ObjectId("55e99afda8deab702bb51001"),
"shippingStatus" : "",
"skuOwner" : ObjectId("55e99afd670a4c5b16e2a6ec")
}
Here is a map reduce that I am trying to run
inventory_map = function() {
var values = {
inventory: this._id,
count: 1
};
emit(this.skuOwner, values);
};
reduce = function(key, values) {
var result = {
"openCount": 0,
"inventory": []
};
values.forEach(function(value) {
result.openCount += 1;
if(value.inventory !== null) {result.inventory.push(value.inventory)}
});
return result;
}
res = db.inventories.mapReduce(inventory_map, reduce, {out: 'openInventory', query: {shippingStatus: {$ne: 'SHIPPED'}}});
Here are the results
I would expect that every one of my documents would conform to the result object that I specified but this does not seem to be the case. Can someone explain to me why I am seeing this behavior?
Upvotes: 0
Views: 60
Reputation: 50406
Same old basic problem, but really hard to mark these as "duplicate" since all the implementations are actually different, but the "same" cause of the problem is always the case.
You are using the wrong method here anyway, but please read on to find out how to do it right.
When reading up on mapReduce
you basically missed this vital piece of information:
MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.
And also later:
the type of the return object must be identical to the type of the value emitted by the map function.
What that means and what you are basically doing wrong here is that your "mapper" is returning compltely different data to what your "reducer" is emitting itself. The problem being that because the reducer can take the "previous output from the reduce function" as input itself and basically "reduce again" then this is where everything fails.
To clarify, the "reduce" is not "all or nothing", but rather an "incremental" approach where not all of the common key values are presented to the function "all at once". Instead only a small "sub-set" of the values are presented and the returned output can be "fed into reduce" yet again. This is basically how you deal with "big data" results, by processing in "chunks" rather than all at once.
Fixing this is generally as simple as making the "mapper" produce the same "output" as the "reducer" expects for "input" and will itself produce as "output". So simple changes make all the difference here:
inventory_map = function() {
var values = {
inventory: [this._id],
openCount: 1 // all we changed on both
};
emit(this.skuOwner, values);
};
reduce = function(key, values) {
var result = {
"openCount": 0,
"inventory": []
};
values.forEach(function(value) {
result.openCount += value.openCount; // and that too
result.inventory = result.inventory.concat(value.inventory); // that as well i guess
});
return result;
}
Now everything is the same from "output" of both "mapper" and "reducer" and the "reducer" also respects the same things as "input", so it works.
The "other hand" of this is that it really seems like you "should" be using .aggregate()
instead. As the operations are very simple and also work "a lot faster" than mapReduce
does, since the operators are all natively coded and not using JavaScript interpretation:
db.inventories.aggregate([
{ "$group": {
"_id": "$skuOwner",
"inventory": { "$push": "$_id" },
"count": { "$sum": 1 }
}}
])
Much more simple, a lot faster and also basically concise. Learn it well.
Upvotes: 2
Reputation: 69663
An important requirement of MapReduce is that the output format of the map-function and the output-format of the reduce-function are identical. This is not the case in your code. Your map output format is:
{
inventory: this._id,
count: 1
};
and your reduce output format is:
{
openCount: 0,
inventory: []
};
The reason why these formats must be identical is because when there is only one value for a key provided by map
, that result MIGHT not be passed to reduce
at all and get directly passed to the output. Also, any of the results from reduce
MIGHT be put into another round of reduce
with previously unprocessed results (this usually only happens when processing very large datasets or when you process data from multiple shards).
Those results which still have a count
field and where inventory
is still a single value and not an array were never passed to your reduce function.
To fix this issue, modify your map function to return results which look identical to the output of your reduce function:
inventory_map = function() {
var value = {
inventory: [ this._id ],
openCount: 1
};
emit(this.skuOwner, value);
};
and modify your reduce function accordingly:
reduce = function(key, values) {
var result = {
"openCount": 0,
"inventory": []
};
values.forEach(function(value) {
result.openCount += value.openCount; // <--!!!
if(value.inventory !== null) {
result.inventory = result.inventory.concat(value.inventory); // <--!!!
}
});
return result;
}
By the way: A simpler way to solve your issue might be an aggregation:
db.inventories.aggregate([
{ $match: {
shippingStatus: {$ne: 'SHIPPED'}
}},
{ $group: {
_id: "$skuOwner",
openCount: { $sum:1 }
}},
{ $out: "openInventory" }
]);
Upvotes: -1