Reputation: 18102
I have a collection of entries like this :
db.mesh_captors.save({'arduino': 0xCB, 'pin': 14, 'value': 35, 'date': datetime.utcnow()})
db.mesh_captors.save({'arduino': 0xCB, 'pin': 14, 'value': 63, 'date': datetime.utcnow()})
db.mesh_captors.save({'arduino': 0xCB, 'pin': 15, 'value': 126, 'date': datetime.utcnow()})
db.mesh_captors.save({'arduino': 0x7B, 'pin': 14, 'value': 121, 'date': datetime.utcnow()})
I want to get the last value of each pin of an arduino. With MySQL, I would have wrote this :
SELECT DISTINCT pin, value
FROM mesh_captors
WHERE arduino = 203
GROUP_BY pin
ORDER BY date DESC
But using MongoDB, I am not quite sure of how to do so.
I tried something like this but is it good enough ?
reducer = Code("""
function (doc, out) {
if(out.date == 0 || out.date < doc.date) {
out.date = doc.date;
out.value = doc.value;
}
}
""")
captors_value = db.mesh_captors.group(key=['pin'], condition={'arduino': int(arduino_id)}, reduce=reducer, initial={'date': 0})
By now, I takes more than 4.5s to execute the request and it takes more and more time as the number of entries grows.
Upvotes: 0
Views: 654
Reputation: 42352
If you can use version 2.1 which is the development version of upcoming 2.2 release, then you can use the new aggregation framework to do this query much faster than you could with map/reduce.
Here is what the aggregation pipeline would look like to get the value that's the latest date for that arguino and pin:
[{$match:{arduino: 0xCB}},
{$project:
{_id: 0, arduino:1, pin:1, maxVal: {date:1, val:"$value"} }
},
{$group:
{_id:{"arduino":1, "pin":1},maxDate:{$max:"$maxVal"} }
},
{$project:
{_id:0, "arduino":"$_id.arduino" , "pin":"$_id.pin","date":"$maxDate.date",value:"$maxDate.val"}
}]
If run on your sample data, the result is:
> db.mesh_captors.aggregate(agg)
{
"result" : [
{
"arduino" : 203,
"pin" : 15,
"date" : "Sat Jun 09 2012 16:22:50 GMT-0700 (PDT)",
"value" : 126
},
{
"arduino" : 203,
"pin" : 14,
"date" : "Sat Jun 09 2012 16:23:00 GMT-0700 (PDT)",
"value" : 63
}
],
"ok" : 1
}
You can access aggregation framework from Python via pymongo support for db.runCommand
. You would execute db.runCommand
passing it the document
{"aggregate":"mesh_captors", "pipeline":<pipeline-goes-here>}
Upvotes: 2
Reputation: 571
You can't use group in sharded collections and that makes it bad choice for most of the tasks. If your not using sharded collections that might be close to the best performance u can get. (Correct me if I'm wrong) You should try implementing the same task with MapReduce and aggregate and compare performance.
This article should help you better understand some advanced aggregation from mongodb.
Upvotes: 0