Reputation: 573
I have the following collection:
{"orderID" : "30688", "branch" : "CO", "customerID" : "11396783", "customerEmail" : "[email protected]"}
{"orderID" : "30688", "branch" : "CO", "customerID" : "11396783", "customerEmail" : "[email protected]"}
{"orderID" : "30688", "branch" : "CO", "customerID" : "11396783", "customerEmail" : "[email protected]"}
{"orderID" : "89765", "branch" : "CO", "customerID" : "54157526", "customerEmail" : ""}
{"orderID" : "89765", "branch" : "CO", "customerID" : "54157526", "customerEmail" : ""}
{"orderID" : "21546", "branch" : "CO", "customerID" : "20103585", "customerEmail" : "[email protected]"}
{"orderID" : "21546", "branch" : "CO", "customerID" : "20103585", "customerEmail" : "[email protected]"}
{"orderID" : "21546", "branch" : "KA", "customerID" : "89374792", "customerEmail" : "[email protected]"}
{"orderID" : "21794", "branch" : "NY", "customerID" : "78125522", "customerEmail" : ""}
I need to get all unique customerIDs in a certain branch whose customerEmail is not null. What I expect for "branch":"CO"
{"customerID" : "11396783", "customerEmail" : "[email protected]"}
{"customerID" : "20103585", "customerEmail" : "[email protected]"}
So far I've tried:
db.collection.aggregate([
{ $match: { branch: "CO" } },
{ $group:
{
_id: { customer:"$customerID"}
}
},
{
$group: {_id:"$_id.customer"}
},
{
$addFields: { email: "$customerEmail"}
}
]);
but it doesn't bring the email field.
Upvotes: 2
Views: 2051
Reputation: 151092
It does not include the field because you did not ask for the field to return. The thing you are missing here is using $first
or a similar "accumulator" in order to return the element during the $group
.
Also if you don't want the empty email address then exclude it within the $match
pipeline stage, since that's the most efficient thing to do.
db.collection.aggregate([
{ $match: { branch: "CO", "customerEmail": { "$ne": "" } } },
{ $group:
{
_id: { customer:"$customerID"},
email: { "$first": "$customerEmail" }
}
}
]);
A "pipeline" only returns "output" from stages like $group
or $project
that you actually ask it to. Just like the "Unix pipe" |
operator, the only things available to the "next stage" are what you output.
This should be evident simply from:
db.collection.aggregate([
{ $match: { branch: "CO" } },
{ $group:
{
_id: { customer:"$customerID"}
}
}
]);
Or even:
db.collection.aggregate([
{ $match: { branch: "CO" } },
{ $project:
{
_id: { customer:"$customerID"}
}
}
]);
Which returns of course only the _id
value since that is all you asked for.
You only have access in any pipeline stage to the data which was "output by the previous stage". Within a $group
that means only the _id
for the grouping key, and whatever was specified "explicitly" using a valid "accumulator" for any other properties you wish to return. Any accumulator ( which is valid for a "string" here ) will do, but anything outside of the _id
must use an "accumulator".
I suggest taking the time to look at all the aggregation operators and what they actually do. There is example usage with each operator
Upvotes: 4