MANCHUCK
MANCHUCK

Reputation: 2462

using $group to sum fields from multiple sub documents in mongodb

Given the following document:

{
        "_id" : ObjectId("53cd79bb300ccae6b3904402"),
        "name" : "test product",
        "sku" : "product-1",
        "price" : 35,
        "cost" : 12,
        "max_cpc" : 100,
        "price_in_cents" : 3500,
        "cost_in_cents" : 1200,
        "max_cpc_in_cents" : 10000,
        "clicks" : [
                {
                        "date" : ISODate("2014-04-25T00:00:00Z"),
                        "clicks" : 2,
                        "channel" : "google",
                        "campaign" : "12345687",
                        "group" : "987654321"
                },
                {
                        "date" : ISODate("2014-04-25T00:00:00Z"),
                        "clicks" : 3,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                },
                {
                        "date" : ISODate("2014-04-24T00:00:00Z"),
                        "clicks" : 1,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                }
        ],
        "impressions" : [
                {
                        "date" : ISODate("2014-04-25T00:00:00Z"),
                        "impressions" : 15,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                },
                {
                        "date" : ISODate("2014-04-24T00:00:00Z"),
                        "impressions" : 33,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                }
        ]
}

I would like to add up the total clicks and the total impressions for this document. I cannot figure out how to get the pipeline for aggregate set up correctly.

The end result would be to have

{
    ObjectId("53cd79bb300ccae6b3904402"),
    total_clicks: 6,
    total_impressions: 48
}

Upvotes: 1

Views: 2325

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151132

This is a relatively simple aggregation operation, but what you generally want to be aware of if using the $unwind operation on each array separately:

db.collection.aggregate([

    // Unwind the first array
    { "$unwind": "$clicks" },

    // Sum results and keep the other array per document
    { "$group": {
        "_id": "$_id",
        "total_clicks": { "$sum": "$clicks.clicks" }
        "impressions": { "$first": "$impressions" }
    }},

    // Unwind the second array
    { "$unwind": "$impressions" },

    // Group the final result keeping the first result
    { "$group": {
        "_id": "$_id",
        "total_clicks": { "$first": "$total_clicks" },
        "total_impressions": { "$sum": "$impressions.impressions" }
    }}

])

That gives you the results you want.

{
    "_id": ObjectId("53cd79bb300ccae6b3904402"),
    "total_clicks": 6,
    "total_impressions": 48
}

The $first operator is fine to use here as you are operating per document in your grouping. If you want this across all documents or by another key, you do the same to add the arrays and then add a final group for other grouping levels.

Just remember to "expand" each array separately otherwise you end up duplicating each element in one by the number of elements in the other if you try to $unwind both at the same time.


Depending your your usage patterns you might consider changing your schema a little. For example, as this data only really varies by "type" then you might consider changing this to a single array of "events":

{
        "_id" : ObjectId("53cd79bb300ccae6b3904402"),
        "name" : "test product",
        "sku" : "product-1",
        "price" : 35,
        "cost" : 12,
        "max_cpc" : 100,
        "price_in_cents" : 3500,
        "cost_in_cents" : 1200,
        "max_cpc_in_cents" : 10000,
        "events" : [
                {
                        "type": "click",
                        "date" : ISODate("2014-04-25T00:00:00Z"),
                        "number" : 2,
                        "channel" : "google",
                        "campaign" : "12345687",
                        "group" : "987654321"
                },
                {
                        "type": "click",
                        "date" : ISODate("2014-04-25T00:00:00Z"),
                        "number" : 3,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                },
                {
                        "type": "click",
                        "date" : ISODate("2014-04-24T00:00:00Z"),
                        "number" : 1,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                },
                {
                        "type": "impression", 
                        "date" : ISODate("2014-04-25T00:00:00Z"),
                        "number" : 15,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                },
                {
                        "type": "impression", 
                        "date" : ISODate("2014-04-24T00:00:00Z"),
                        "number" : 33,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                }
        ]
}

The changed aggregation structure for something like this would look like:

db.collection.aggregate([

    // Unwind the events array
    { "$unwind": "$events" },

    // Group each "type" conditionally
    { "$group": {
        "_id": "$_id",
        "total_clicks": {
            "$sum": {
                "$cond": [
                    { "$eq": [ "$events.type", "click" ] },
                    "$events.number",
                    0
                ]
            }
        },
        "total_impressions": {
            "$sum": {
                "$cond": [
                    { "$eq": [ "$events.type", "impression" ] },
                    "$events.number",
                    0
                ]
            }
        }
    }}

That uses $cond which is a ternary operator, evaluating a logical condition and choosing which value to pass to $sum depending on whether the condition was true or false.

Or you could just aggregate on the "type" alone:

db.collection.aggregate([

    // Unwind the events array
    { "$unwind": "$events" },

    // Group each "type" conditionally
    { "$group": {
        "_id": { "_id": "$_id", "type": "$events.type" },
        "total": { "$sum": "$events.number" }
    }}

])

With a slightly different result:

{
    "_id": {
        "_id": ObjectId("53cd79bb300ccae6b3904402"),
        "type": "clicks"
    },
    "total": 6
},
{
    "_id": {
        "_id": ObjectId("53cd79bb300ccae6b3904402"),
        "type": "impressions"
    },
    "total": 48
}

Finally, if you can live with things such as you do not need to atomically update things on the fields outside of the array as you add or otherwise update array members, then placing your "event stream" in a separate collection removes the need to call $unwind:

{
    "sku_id" : ObjectId("53cd79bb300ccae6b3904402"),
    "name" : "test product",
    "sku" : "product-1",
    "type": "click",
    "date" : ISODate("2014-04-25T00:00:00Z"),
    "number" : 2,
    "channel" : "google",
    "campaign" : "12345687",
    "group" : "987654321"
},
{
    "sku_id" : ObjectId("53cd79bb300ccae6b3904402"),
    "name" : "test product",
    "sku" : "product-1",
    "type": "impression", 
    "date" : ISODate("2014-04-24T00:00:00Z"),
    "number" : 33,
    "channel" : "google",
    "campaign" : "8675309",
    "group" : "9035768"
}

And statement:

db.eventstream.aggregate([
    { "$group": {
        "_id": "$sku_id",
        "total_clicks": {
            "$sum": {
                "$cond": [
                    { "$eq": [ "$type", "click" ] },
                    "$number",
                    0
                ]
            }
        },
        "total_impressions": {
            "$sum": {
                "$cond": [
                    { "$eq": [ "$type", "impression" ] },
                    "$number",
                    0
                ]
            }
        }
    }}
])

Upvotes: 6

Related Questions