azathoth
azathoth

Reputation: 573

Include fields in mongodb aggregate

I have the following collection:

{"orderID" : "30688", "branch" : "CO", "customerID" : "11396783", "customerEmail" : "[email protected]"}
{"orderID" : "30688", "branch" : "CO", "customerID" : "11396783", "customerEmail" : "[email protected]"}
{"orderID" : "30688", "branch" : "CO", "customerID" : "11396783", "customerEmail" : "[email protected]"}
{"orderID" : "89765", "branch" : "CO", "customerID" : "54157526", "customerEmail" : ""}
{"orderID" : "89765", "branch" : "CO", "customerID" : "54157526", "customerEmail" : ""}
{"orderID" : "21546", "branch" : "CO", "customerID" : "20103585", "customerEmail" : "[email protected]"}
{"orderID" : "21546", "branch" : "CO", "customerID" : "20103585", "customerEmail" : "[email protected]"}
{"orderID" : "21546", "branch" : "KA", "customerID" : "89374792", "customerEmail" : "[email protected]"}
{"orderID" : "21794", "branch" : "NY", "customerID" : "78125522", "customerEmail" : ""}

I need to get all unique customerIDs in a certain branch whose customerEmail is not null. What I expect for "branch":"CO"

{"customerID" : "11396783", "customerEmail" : "[email protected]"}
{"customerID" : "20103585", "customerEmail" : "[email protected]"}

So far I've tried:

db.collection.aggregate([
    { $match: { branch: "CO" } },   
    { $group:
        {
            _id: { customer:"$customerID"}
        }
    },
    {
        $group: {_id:"$_id.customer"}
    },
    {
        $addFields: {  email: "$customerEmail"} 
    }
]);

but it doesn't bring the email field.

Upvotes: 2

Views: 2051

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151092

It does not include the field because you did not ask for the field to return. The thing you are missing here is using $first or a similar "accumulator" in order to return the element during the $group.

Also if you don't want the empty email address then exclude it within the $match pipeline stage, since that's the most efficient thing to do.

db.collection.aggregate([
    { $match: { branch: "CO", "customerEmail": { "$ne": "" } } },
    { $group:
        {
            _id: { customer:"$customerID"},
            email: { "$first": "$customerEmail" }
        }
    }
]);

A "pipeline" only returns "output" from stages like $group or $project that you actually ask it to. Just like the "Unix pipe" | operator, the only things available to the "next stage" are what you output.

This should be evident simply from:

db.collection.aggregate([
    { $match: { branch: "CO" } },   
    { $group:
        {
            _id: { customer:"$customerID"}
        }
    }
]);

Or even:

db.collection.aggregate([
    { $match: { branch: "CO" } },   
    { $project:
        {
            _id: { customer:"$customerID"}
        }
    }
]);

Which returns of course only the _id value since that is all you asked for.

You only have access in any pipeline stage to the data which was "output by the previous stage". Within a $group that means only the _id for the grouping key, and whatever was specified "explicitly" using a valid "accumulator" for any other properties you wish to return. Any accumulator ( which is valid for a "string" here ) will do, but anything outside of the _id must use an "accumulator".

I suggest taking the time to look at all the aggregation operators and what they actually do. There is example usage with each operator

Upvotes: 4

Related Questions