terjeto
terjeto

Reputation: 386

pymongo / mongodb advanced query

I'm a novice in pymongo/mongodb and now I have a challenge.

I have the following structure stored in mongodb (v 2.04).

{
    "t": <timestamp>, 
    "d": {
        "uid": <string>,
        "action": <string>
    }
}

This structure tracks user actions and is slightly reduced in complexity from my original. The data is pretty huge and the query will have a limiting datespan to reduce the results.

What I would like is to be able to create a table of the users who did the most actions during a certain timespan.

Table:

Rank    Uid    #num actions
1       5      235
2       237    234
3       574    229

So far I've only bits and pieces of the query:

query = {"t": {"$lte": end_utc, "$gte": start_utc}}
db.actions.find(query).distinct("d.uid")

Which will simply produce a list of unique uid's. How can I query (using pymongo) to get a list such as:

[
    {
        "actions": 100,
        "uid": 273
    },
    {
        "actions": 99",
        "uid": 632
    }..n sorted on actions descending

]

Upvotes: 2

Views: 680

Answers (1)

JohnnyHK
JohnnyHK

Reputation: 312115

If you're using MongoDB 2.1+ you can use the aggregation framework for this type of query:

db.actions.aggregate([
    # Filter the docs to just those within the specified timerange
    {"$match": {"t": {"$lte": end_utc, "$gte": start_utc}}},

    # Group the docs on d.uid, assembling a count of action docs with each value
    {"$group": {"_id": "$d.uid", "actions": {"$sum": 1}}},

    # Sort by actions, descending
    {"$sort": { "actions": -1 }}
])

Upvotes: 5

Related Questions