Reputation: 3443
So this is strange. I'm trying to use mapreduce to group datetime/metrics under a unique port:
Document layout:
{
"_id" : ObjectId("5069d68700a2934015000000"),
"port_name" : "CL1-A",
"metric" : "340.0",
"port_number" : "0",
"datetime" : ISODate("2012-09-30T13:44:00Z"),
"array_serial" : "12345"
}
and mapreduce functions:
var query = {
'array_serial' : array,
'port_name' : { $in : ports },
'datetime' : { $gte : from, $lte : to}
}
var map = function() {
emit( { portname : this.port_name } , { datetime : this.datetime,
metric : this.metric });
}
var reduce = function(key, values) {
var res = { dates : [], metrics : [], count : 0}
values.forEach(function(value){
res.dates.push(value.datetime);
res.metrics.push(value.metric);
res.count++;
})
return res;
}
var command = {
mapreduce : collection,
map : map.toString(),
reduce : reduce.toString(),
query : query,
out : { inline : 1 }
}
mongoose.connection.db.executeDbCommand(command, function(err, dbres){
if(err) throw err;
console.log(dbres.documents);
res.json(dbres.documents[0].results);
})
If a small number of records is requested, say 5 or 10, or even 60 I get all the data back I'm expecting. Larger queries return truncated values....
I just did some more testing and it seems like it's limiting the record output to 100? This is minutely data and when I run a query for a 24 hour period I would expect 1440 records back... I just ran it a received 80. :\
Is this expected? I'm not specifying a limit anywhere I can tell...
More data:
Query for records from 2012-10-01T23:00 - 2012-10-02T00:39 (100 minutes) returns correctly:
[
{
"_id": {
"portname": "CL1-A"
},
"value": {
"dates": [
"2012-10-01T23:00:00.000Z",
"2012-10-01T23:01:00.000Z",
"2012-10-01T23:02:00.000Z",
...cut...
"2012-10-02T00:37:00.000Z",
"2012-10-02T00:38:00.000Z",
"2012-10-02T00:39:00.000Z"
],
"metrics": [
"1596.0",
"1562.0",
"1445.0",
...cut...
"774.0",
"493.0",
"342.0"
],
"count": 100
}
}
]
...add one more minute to the query 2012-10-01T23:00 - 2012-10-02T00:39 (101 minutes) :
[
{
"_id": {
"portname": "CL1-A"
},
"value": {
"dates": [
null,
"2012-10-02T00:40:00.000Z"
],
"metrics": [
null,
"487.0"
],
"count": 2
}
}
]
the dbres.documents
object shows the correct expected emitted records:
[ { results: [ [Object] ],
timeMillis: 8,
counts: { input: 101, emit: 101, reduce: 2, output: 1 },
ok: 1 } ]
...so is the data getting lost somewhere?
Upvotes: 5
Views: 3724
Reputation: 42352
Rule number one of MapReduce:
Thou shall return from Reduce the exact same format that you emit with your key in Map.
Rule number two of MapReduce:
Thou shall reduce the array of values passed to reduce as many times as necessary. Reduce function may be called many times.
You've broken both of those rules in your implementation of reduce.
Your Map function is emitting key, value pairs.
key: port name (you should simply emit the name as the key, not a document)
value: a document representing three things you need to accumulate (date, metric, count)
Try this instead:
map = function() { // if you want to reduce to an array you have to emit arrays
emit ( this.port_name, { dates : [this.datetime], metrics : [this.metric], count: 1 });
}
reduce = function(key, values) { // for each key you get an array of values
var res = { dates: [], metrics: [], count: 0 }; // you must reduce them to one
values.forEach(function(value) {
res.dates = value.dates.concat(res.dates);
res.metrics = value.metrics.concat(res.metrics);
res.count += value.count; // VERY IMPORTANT reduce result may be re-reduced
})
return res;
}
Upvotes: 13
Reputation: 1558
Try to output the map reduce data in a temp collection instead of in memory. May that is the reason. From Mongo Docs:
{ inline : 1} - With this option, no collection will be created, and the whole map-reduce operation will happen in RAM. Also, the results of the map-reduce will be returned within the result object. Note that this option is possible only when the result set fits within the 16MB limit of a single document. In v2.0, this is your only available option on a replica set secondary.
Also, It may not be the reason but MongoDB has data size limitations (2GB) on a 32bit machine.
Upvotes: 1