MFB
MFB

Reputation: 19807

How can I use Mongodb Aggregation in this example?

I am currently using Python to build many of my results instead of MongoDB itself. I am trying to get my head around Aggregation, but I'm struggling a bit. Here is an example of what I am doing currently which perhaps could be better handled by MongoDB.

I have a collection of programs and a collection of episodes. Each program has a list of episodes (DBRefs) associated with it. (The episodes are stored in their own collection because both programs and episodes are quite complex and deep, so embedding is impractical). Each episode has a duration (float). If I want to find a program's average episode duration, I do this:

episodes = list(db.Episodes.find({'Program':DBRef('Programs',ObjectId(...))}))
durations = set(e['Duration'] for e in episodes if e['Duration'] > 0)
avg_mins = int(sum(durations) / len(durations) / 60

This is pretty slow when a program has over 1000 episodes. Is there a way I can do it in MongoDB?

Here is some sample data in Mongo shell format. There are three episodes belonging to the same program. How can I calculate the average episode duration for the program?

> db.Episodes.find({
    '_Program':DBRef('Programs',ObjectId('4ec634fbf4c4005664000313'))},
   {'_Program':1,'Duration':1}).limit(3)

{
    "_id" : ObjectId("506c15cbf4c4005f9c40f830"),
    "Duration" : 1643.856,
    "_Program" : DBRef("Programs", ObjectId("4ec634fbf4c4005664000313"))
}
{
    "_id" : ObjectId("506c15d3f4c4005f9c40f8cf"),
    "Duration" : 1598.088,
    "_Program" : DBRef("Programs", ObjectId("4ec634fbf4c4005664000313"))
}
{
    "_id" : ObjectId("506c15caf4c4005f9c40f80e"),
    "_Program" : DBRef("Programs", ObjectId("4ec634fbf4c4005664000313")),
    "Duration" : 1667.04
}

Upvotes: 2

Views: 7378

Answers (1)

MFB
MFB

Reputation: 19807

I figured it out, and it is ridiculously fast compared to pulling it all into Python.

p = db.Programs.find_one({'Title':'...'})

pipe = [
        {'$match':{'_Program':DBRef('Programs',p['_id']),'Duration':{'$gt':0}}},
        {'$group':{'_id':'$_Program', 'AverageDuration':{'$avg':'$Duration'}}}
        ]

eps = db.Episodes.aggregate(pipeline=pipe)

print eps['result']

Upvotes: 6

Related Questions