Reputation: 4273
My actors
collection contains an array-of-documents field, called acted_in
. Instead of returning the size of acted_in.idmovies
like so: {$size: $acted_in.idmovies}
, I want to return the number of distinct values inside $acted_in.idmovies
. How can I do that ?
c1 = actors.aggregate([{"$match": {'$and': [{'fname': f_name},
{'lname': l_name}]}},
{"$project": {'first_name': '$fname',
'last_name': '$lname',
'gender': '$gender',
'distinct_movies_played_in': {'$size': '$acted_in.idmovies'}}}])
Upvotes: 3
Views: 4043
Reputation: 1010
This should give you what you want:
actors.aggregate([
{
$match: {fname: f_name, lname: l_name}
},
{
$unwind: '$tags'
},
{
$group: {
_id: '$_id',
first_name: {$first: '$fname'},
last_name: {$last: '$lname'},
gender: {$first: '$gender'},
tags: {$addToSet: '$tags'}
}
},
{
$project: {
first_name: 1,
last_name: 1,
gender: 1,
distinct: {$size: '$tags'}
}
}
])
After the tags
array is deconstructed and then put back into a set of itself, then you just need to get the number of items or length of that set.
Upvotes: 2
Reputation: 151092
You basically need to include $setDifference
in there to obtain the "distinct" items. All "sets" are "distinct" by design and by obtaining the "difference" from the present array to an empty one []
you get the desired result. Then you can apply the $size
.
You also have some common mistakes/misconceptions. Firstly when using $match
or any MongoDB query expression you do not need to use $and
unless there is an explicit case to do so. All query expression arguments are "already" AND conditions unless explicitly stated otherwise, as with $or
. So don't explicitly use for this case.
Secondly your $project
was using the explicit field path variables for every field. You do not need to do that just to return the field, and outside of usage in an "expression", you can simply use a 1
to notate you want it included:
c1 = actors.aggregate([
{ "$match": { "fname"': f_name, "lname": l_name } },
{ "$project": {
"first_name": 1,
"last_name": 1,
"gender": 1,
"distinct_movies_played_in": {
"$size": { "$setDifference": [ "$acted_in.idmovies", [] ] }
}
}}
])
In fact, if you are actually using MongoDB 3.4 or greater ( and your notation of an element within an array "$acted_in.idmovies"
says you have at least MongoDB 3.2 ) which has support for $addFields
then use that instead of specifying all other fields in the document.
c1 = actors.aggregate([
{ "$match": { "fname"': f_name, "lname": l_name } },
{ "$addFields": {
"distinct_movies_played_in": {
"$size": { "$setDifference": [ "$acted_in.idmovies", [] ] }
}
}}
])
Unless you explicitly need to just specify "some" other fields.
The basic case here is do not use $unwind
for array operations unless you specifically need to perform a $group
operation on with it's _id
key pointing at a value obtained from "within" the array.
In all other cases, MongoDB has far more efficient operators for working with arrays that what $unwind
does.
Upvotes: 8