Reputation: 1824
Each instance of a collection called groups
has a field called actives
, which is a list of "subdocuments", i.e. things of the form {keys: values}. One field (key) of the subdocuments is id_
, which is a string.
If I take the set of all subdocuments present in all the instances of groups
, then there won't be 2 equal id_
, i.e. id_
identifies uniquely each subdocument. However, I get a new subdocument. I need to run a program with the subdocument's id that will go to a website and extract info about the subdocument. Within this info I find the group that the subdocument belongs to. However, I don't want to run this program if I already have some subdocument, in some instance of groups
with the same id_
as the "new" subdocument.
How can I list the ids of all the subdocuments of all the documents (or instances of groups
)?
Edit:
Suppose that the documents of the DB groups are:
doc1: {"neighbourhood": "n1", "actives": [{"id_": "MHTEQ", "info": "a_long_string"}, {"id_": "PNPQA", "info": "a_long_string"}]}
doc2: {"neighbourhood": "n2", "actives": [{"id_": "MERVX", "info": "a_long_string"}, {"id_": "ZDKJW", "info": "a_long_string"}]}
What I want to do is to list all the "id_"
, i.e.
def list_ids(groups):
do_sth_with_groups
return a_list
print(list_ids(groups))
output: ["MHTEQ", "PNPQA", "MERVX", "ZDKJW"]
Upvotes: 0
Views: 1500
Reputation: 36
Use the aggregation pipeline with the $unwind
and $project
operators.
results = db['collection'].aggregate(
[
{"$project": {"actives": 1, "_id": 0}},
{"$unwind": "$actives"},
{"$project": {"id_str": "$actives.id_", "_id": 0}}
]
)
return list(results)
https://docs.mongodb.com/v3.2/reference/operator/aggregation/unwind/ https://docs.mongodb.com/v3.2/reference/operator/aggregation/project/
Sample output
{
"id_str" : "MHTEQ"
}
{
"id_str" : "PNPQA"
}
{
"id_str" : "MERVX"
}
{
"id_str" : "ZDKJW"
}
Upvotes: 1