kl02
kl02

Reputation: 582

Combine Different Grouping Totals in Aggregate Output

Now that I've had a weekend of banging my head on $project, aggregate(), and $group, it's time for another round of throwing myself on your mercy. I'm trying to do a call where I get back the totals for users, grouped by sex (this was the easier part) and grouped by age range (this is defeating me).

I got it to work with one group:

        Person.aggregate([
            {
                $match: {
                    user_id: id
                }
            },
            {
                $group: {
                        _id: '$gender',
                        total: { $sum: 1 }
                }
            }
        ])
        .exec(function(err, result) {
                etc...

From that, it'll give me how many men, how many women in a nice json output. But if I add a second group, it seems to skip the first and throw hissy fits about the second:

        Person.aggregate([
            {
                $match: {
                    user_id: id
                }
            },
            {
                $group: {
                        _id: '$gender',
                        total: { $sum: 1 }
                },
                $group: {
                        _id: '$age',
                        age: { $gte: 21 },
                        age: { $lte: 30 },
                        total: { $sum: 1 }
                }
            }
        ])
        .exec(function(err, result) {
                etc...

It doesn't like the $gte or $lte. If I switch it to $project, then it'll do the gte/lte but throws fits about $sum or $count. On top of that, I can't find any examples anywhere of how to construct a multi-request return. It's all just "here's this one thing," but I don't want to make 12+ calls just to get all the Person age-groups. I was hoping for output that looks something like this:

    [
        {"_id":"male","total":49},
        {"_id":"woman","total":42},
        {"_id":"age0_10", "total": 1},
        {"_id":"age11_20", "total": 5},
        {"_id":"age21_30", "total": 15}
    ]

(I have no idea how to make the _id for age be something other than the actual age, which doesn't make sense, b/c I don't want an id of 1517191919 or whatever, I want a reliable name so I know where to output it in my template. So I do know that _id: "$age" won't give me what I want, but I don't know how to get what I want, either.)

The only time I've seen more than one thing, it was a $match, a $group, and a $project. But if $project means I can't use $sum or $count, can I do multiple $groups, and if I can, what's the trick to it?

Upvotes: 1

Views: 411

Answers (1)

Blakes Seven
Blakes Seven

Reputation: 50406

As for the case of producing the results in different age groupings, the $cond operator of the aggregation framework can help here. As a ternary operator, it takes a logical result ( if condition ) and can return a value where true ( then ) or otherwise where false ( else ). In the case of varying age groups you would "nest" the calls in the else condition to meet each range until logically exhausted.

The overall case is not really practical to do in a single pass with both results for "gender" and "age" in groupings. Whilst it "could" be done, the only method is basically accumulating all data in arrays and working that out again for subsuquent groupings. Not a great idea, as it almost always would break the practical BSON limit of 16MB when attempting to keep the data. So a better approach is generally required.

As such, where the API supports ( you are under nodejs, so it does ), then it is usually best to run each query separately and combine the results. The node async library has just such features:

async.concat(
    [
        // Gender aggregator
        [
            { "$group": {
                "_id": "$gender",
                "total": { "$sum": 1 }
            }}
        ],
        // Age aggregator
        [
            { "$group": {
                "_id": {
                    "$cond": {
                        "if": { "$lte": [ "$age", 10 ] },
                        "then": "age_0_10",
                        "else": {
                            "$cond": {
                               "if": { "$lte": [ "$age", 20 ] },
                               "then": "age_11_20",
                               "else": {
                                   "$cond": {
                                       "if": { "$lte": [ "$age", 30 ] },
                                       "then": "age_21_30",
                                       "else": "age_over_30"
                                   }
                               }
                            }
                        }
                    }
                },
                "total": { "$sum": 1 }
            }}
        ]
    ],
    function(pipeline,callback) {
        Person.aggregate(pipeline,callback);
    },
    function(err,results) {
        if (err) throw err;
        console.log(results);
    }
);

The default execution of async.concat here will kick off the tasks to run in parallel, so both can be running on the server at the same time. Each pipeline in the input array will be passed to the aggregate method, which is going to then return the results and combine the output arrays in the final result.

The end result is not only do you have the results nicely keyed to age groups, but the two result sets appear to be in the same combined response, with no other work required to merge the content.

This is not only convenient, but the parallel execution makes this much more time efficient and far less taxing ( if not beating the impossible ) on the aggregation method being used to return the results.

Upvotes: 1

Related Questions