K. Barresi
K. Barresi

Reputation: 1315

Aggregation Grouping by Two Fields in MongoDB

I have a schema like the following:

Event : {
    eventType : Number,
    created : Date,
}

My end goal is to create a line graph for each eventType that shows how many of each event was posted daily.

I've never tried the MongoDB aggregation functions, so I'm a little confused on how to go about doing this. I read through the MongoDB aggregation documentation and my initial thought is to do two grouping and one project passes:

  1. Group every event into a day
  2. Group the results of this by eventType
  3. Project these results so the output is in a nice format to graph.

So my output would look something like this (so I can put it on line graphs:

{
  [
    { 
        eventType: 0, 
        days : [ ISODate(2015-01-01), ISODate(2015-01-02), ISODate(2015-01-03) ], 
        totals: [ 0, 15, 3 ] 
    }, {
        eventType: 1, 
        days : [ ISODate(2015-01-01), ISODate(2015-01-02), ISODate(2015-01-03) ], 
        totals: [ 4, 5, 2 ] 
    }, {        
    ...
  ]
}

I'm not sure if conceptually that's correct, and I'm even less sure about the syntax this would require. Any help would be appreciated!

Upvotes: 0

Views: 119

Answers (1)

Blakes Seven
Blakes Seven

Reputation: 50406

Yes the basic idea here is to group twice, and you also want to make use of the date aggregation operators:

db.event.aggregate([
    { "$group": {
        "_id": {
            "eventType":"$eventType",
            "date": { 
                "year": { "$year": "$created" },
                "month": { "$month": "$created" },
                "day": { "$dayOfMonth": "$created" }
            }
        },
        "total": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id.eventType",
        "days": { "$push": "$_id.date" },
        "totals": { "$push": "$total" }
    }}
])

Or you can use date math to return a timestamp value instead:

db.event.aggregate([
    { "$group": {
        "_id": {
            "eventType":"$eventType",
            "date": { 
                "$subtract": [
                    { "$subtract": [ "$created", new Date(0) ] },
                    { "$mod": [
                        { "$subtract": [ "$created", new Date(0) ] },
                    ]}
                ]
            }
        },
        "total": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id.eventType",
        "days": { "$push": "$_id.date" },
        "totals": { "$push": "$total" }
    }}
])

Personally, two arrays does not seem as readable as one does to me. So I prefer this:

db.event.aggregate([
    { "$group": {
        "_id": {
            "eventType":"$eventType",
            "date": { 
                "$subtract": [
                    { "$subtract": [ "$created", new Date(0) ] },
                    { "$mod": [
                        { "$subtract": [ "$created", new Date(0) ] },
                    ]}
                ]
            }
        },
        "total": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id.eventType",
        "days": { 
            "$push": { 
                "date": "$_id.date", "total": "$total"
            }
        }
    }}
])

Where each array index already contains all of the information.

Most graphing packages are really quite flexible when you really look at the documentation about supported data structures. It is a common MongoDB outuput request to format data in a particular way ( all in arrays ) for a "popular package", but that package actually does support the standard object lists as long as you configure it too. It's just that all the "basic examples" don't tell you otherwise.

Upvotes: 1

Related Questions