Distinct Group By using mongodb with pymongo

Question

I have a collection of entries like this :

db.mesh_captors.save({'arduino': 0xCB, 'pin': 14, 'value': 35, 'date': datetime.utcnow()})
db.mesh_captors.save({'arduino': 0xCB, 'pin': 14, 'value': 63, 'date': datetime.utcnow()})
db.mesh_captors.save({'arduino': 0xCB, 'pin': 15, 'value': 126, 'date': datetime.utcnow()})
db.mesh_captors.save({'arduino': 0x7B, 'pin': 14, 'value': 121, 'date': datetime.utcnow()})

I want to get the last value of each pin of an arduino. With MySQL, I would have wrote this :

SELECT DISTINCT pin, value
FROM mesh_captors
WHERE arduino = 203
GROUP_BY pin
ORDER BY date DESC

But using MongoDB, I am not quite sure of how to do so.

I tried something like this but is it good enough ?

reducer = Code("""
              function (doc, out) {
                  if(out.date == 0 || out.date < doc.date) {
                       out.date = doc.date;
                       out.value = doc.value;
                  }
              }
              """)

captors_value = db.mesh_captors.group(key=['pin'], condition={'arduino': int(arduino_id)}, reduce=reducer, initial={'date': 0})

By now, I takes more than 4.5s to execute the request and it takes more and more time as the number of entries grows.

Asya Kamsky · Accepted Answer

If you can use version 2.1 which is the development version of upcoming 2.2 release, then you can use the new aggregation framework to do this query much faster than you could with map/reduce.

Here is what the aggregation pipeline would look like to get the value that's the latest date for that arguino and pin:

[{$match:{arduino: 0xCB}},
{$project:
       {_id: 0, arduino:1, pin:1, maxVal: {date:1, val:"$value"} }
},
{$group:
       {_id:{"arduino":1, "pin":1},maxDate:{$max:"$maxVal"} }    
},
{$project:
       {_id:0, "arduino":"$_id.arduino" , "pin":"$_id.pin","date":"$maxDate.date",value:"$maxDate.val"}
}]

If run on your sample data, the result is:

> db.mesh_captors.aggregate(agg)
{
    "result" : [
        {
            "arduino" : 203,
            "pin" : 15,
            "date" : "Sat Jun 09 2012 16:22:50 GMT-0700 (PDT)",
            "value" : 126
        },
        {
            "arduino" : 203,
            "pin" : 14,
            "date" : "Sat Jun 09 2012 16:23:00 GMT-0700 (PDT)",
            "value" : 63
        }
    ],
    "ok" : 1
}

You can access aggregation framework from Python via pymongo support for db.runCommand. You would execute db.runCommand passing it the document

{"aggregate":"mesh_captors", "pipeline":}

Distinct Group By using mongodb with pymongo

Answers (2)

Related Questions