Cihan Bebek
Cihan Bebek

Reputation: 418

MongoDB aggregate/grouping by key-value pairs

My data looks something like this:

    { 
            "_id" : "9aa072e4-b706-47e6-9607-1a39e904a05a", 
            "customerId" : "2164289-4", 
            "channelStatuses" : {
                    "FOO" : {
                    "status" : "done"
                    }, 
                    "BAR" : {
                    "status" : "error"
                    }
            }, 
            "channel" : "BAR", 
    }

My aggregate/group looks like this:

    { 
            "_id" : {
                    "customerId" : "$customerId", 
                    "channel" : "$channel", 
                    "status" : "$channelStatuses[$channel].status"
            }, 
                    "count" : {
                    "$sum" : 1
            }
    }

So basically with the example data the group should give me a group grouped by:

   {"customerId": "2164289-4", "channel": "BAR", "status": "error"}

But I cannot use []-indexing in a aggregate/group. What should I do instead?

Upvotes: 5

Views: 3055

Answers (1)

Blakes Seven
Blakes Seven

Reputation: 50426

You cannot get the result you want with the current structure using .aggregate(). You "could" change the structure to use an array rather than named keys, and the operation is actually quite simple.

So with a document like:

    { 
            "_id" : "9aa072e4-b706-47e6-9607-1a39e904a05a", 
            "customerId" : "2164289-4", 
            "channelStatuses" : [
                {
                    "channel": "FOO",
                    "status" : "done"
                }, 
                {
                    "channel": "BAR",
                    "status" : "error"
                }
            ], 
            "channel" : "BAR", 
    }

You can then do in modern releases with $filter, $map and $arrayElemAt:

    { "$group": {
        "_id": {
            "customerId" : "$customerId", 
            "channel" : "$channel", 
            "status": {
                "$arrayElemAt": [
                    { "$map": {
                        "input": { "$filter": {
                            "input": "$chanelStatuses",
                            "as": "el", 
                            "cond": { "$eq": [ "$$el.channel", "$channel" ] }
                        }},
                        "as": "el",
                        "in": "$$el.status"
                    }},
                    0
                ]
            }
        },
        "count": { "$sum": 1 }
    }}

Older versions of MongoDB are going to going to require $unwind to access the matched array element.

In MongoDB 2.6 then you can still "pre-filter" the array before unwind:

[
    { "$project": {
        "customerId": 1,
        "channel": 1,
        "status": {
            "$setDifference": [
                { "$map": {
                    "input": "$channelStatuses",
                    "as": "el",
                    "in": {
                        "$cond": [
                            { "$eq": [ "$$el.channel", "$channel" ] },
                            "$$el.status",
                            false
                        ]
                    }
                }},
                [false]
            ]
        }
    }},
    { "$unwind": "$status" },
    { "$group": {
        "_id": {
            "customerId": "$customerId",
            "channel": "$channel",
            "status": "$status"
        },
        "count": { "$sum": 1 }
    }}
]

And anything prior to that you "filter" after $unwind instead:

[
    { "$unwind": "$channelStatuses" },
    { "$project": {
        "customerId": 1,
        "channel": 1,
        "status": "$channelStatuses.status",
        "same": { "$eq": [ "$channelStatuses.status", "$channel" ] }
    }},
    { "$match": { "same": true } },
    { "$group": {
        "_id": "$_id",
        "customerId": { "$first": "$customerId" },
        "channel": { "$first": "$channel" },
        "status": { "$first": "$status" }
    }},
    { "$group": {
        "_id": {
            "customerId": "$customerId",
            "channel": "$channel",
            "status": "$status"
        },
        "count": { "$sum": 1 }
    }}
]

In a lesser version than MongoDB 2.6 you also need to $project the result of the equality test between the two fields and then $match on the result in a seperate stage. You might also note the "two" $group stages, since the first one removes any possible duplicates of the "channel" values after the filter via the $first accumulators. The following $group is exactly the same as in the previous listing.

But if you cannot change the structure and need "flexible" matching of keys where you cannot supply every name, then you must use mapReduce:

db.collection.mapReduce(
    function() {
       emit({
           "customerId": this.customerId,
           "channel": this.channel,
           "status": this.channelStatuses[this.channel].status
       },1);
    },
    function(key,values) {
        return Array.sum(values);
    },
    { "out": { "inline": 1 } }
)

Where of course you can use that sort of notation

Upvotes: 2

Related Questions