Reputation: 12260
I have energy_carrier
documents that are nested in a collection energy_ carrier_groups
.
I reference those energy_carrier
documents from another collection tech
and would like to resolve the reference with a $lookup
aggregation.
=> How can I define a sub query inside $lookup
that preprocesses /unwinds the energy carriers before I do the actual join/lookup?
My preferred way would be to specify a path for the from
or foreignField
option to target the nested documents of the energy_carrier_groups
collection:
"from": "energy_carrier_groups.energy_carriers"
or
"from": "energy_carrier_groups"
"foreignField": "energy_carriers._id".
However, that does not seem to work.
I found that $lookup
supports let
and pipeline
arguments as an alternative to the options localField
and foreignField
(since Version 3.6) and that might be the way to go.
Since version 5.0 its also possible to combine all four options:
(A requirement for mongodb version 5.0 is a CPU that supports AVX and mine doesn't.)
All that options make knots in my head! Could you please tell me how I should formulate the pipeline to resolve my reference from tech to energy carrier? Can't be that difficult, could it?
Code to create some example data:
import bson
from pymongo import MongoClient
def unique_id():
return bson.objectid.ObjectId()
client = MongoClient(host='localhost', port=27017)
database = client.forecast
referenced_id = unique_id()
# create referenced collection
energy_carrier_groups = database.energy_carrier_groups
energy_carrier_groups.delete_many({})
energy_carrier_groups.insert_many([
{
'_id': unique_id(),
'name': 'fuels',
'energy_carriers': [
{
'_id': referenced_id,
'name': 'oil'
},
{
'_id': unique_id(),
'name': 'gas'
}
]
},
{
'_id': unique_id(),
'name': 'electricity',
'energy_carriers': [
{
'_id': unique_id(),
'name': 'green electricity'
},
{
'_id': unique_id(),
'name': 'conventional electricity'
}
]
},
])
# create referencing collection
tech = database.tech
tech.delete_many({})
tech.insert_many([
{
'_id': unique_id(),
'name': 'qux',
'energy_carrier': referenced_id
},
])
Expected result of my aggregation:
{
'_id': ObjectId('6183de1b5dd889cfcdeaa711'),
'name': 'qux',
'energy_carrier': {
'_id': ObjectId('6183de1b5dd889cfcdeaa70b'),
'name': 'oil'
}
}
First trial, using a path to the nested document:
pipeline = [
{"$match": {"name": 'qux'}},
{"$lookup": {
"from": "$energy_carrier_groups.energy_carriers", # <= does not work
"localField": "energy_carrier",
"foreignField": "_id",
"as": "energy_carrier"
}
},
{"$unwind": "$energy_carrier"}, # transforms lookup result array to a single entry
]
results = referencing.aggregate(pipeline)
for result in results:
print(result)
print('finished')
Another trial, using let
and pipeline
instead of localField
and foreignField
:
pipeline = [
{"$match": {"name": 'qux'}},
{"$lookup": {
"from": "energy_carrier_groups",
"let": {"tech_energy_carrier_id": "$energy_carrier"},
"pipeline": [
{"$unwind": "$energy_carriers"},
{"$match": {"$expr": {"$eq": ["$$tech_energy_carrier_id", "$energy_carriers._id"]}}}
],
"as": "energy_carrier" # overrides id field with an array wrapping the resolved reference
}
},
{"$unwind": "$energy_carrier"}, # transforms array to a single entry
]
results = tech.aggregate(pipeline)
for result in results:
print(result)
print('finished')
Gives some result but resolves the reference with "filtered energy carrier groups" instead of resolving only the energy carrier.
=> What is the recommended way to resolve the referenced energy carrier for tech?
=> If there is a better No-Sql Database then MongoDb for this porpose, please let me know, too
Related:
MongoDB $lookup on nested document
$lookup on ObjectId's in an array
https://www.stackchief.com/tutorials/%24lookup%20Examples%20%7C%20MongoDB
Upvotes: 1
Views: 619
Reputation: 12260
A. Here is another version, based on my initial trial to use lookup and combined with the addFields trick of turivishal to override/correct the resulting property energy_carrier.
pipeline = [
{"$match": {"name": 'qux'}},
{"$lookup": {
"from": "energy_carrier_groups",
"let": {"energy_carrier_id": "$energy_carrier"}, # executed on tech
"pipeline": [ # executed on energy_carrier_groups, with the knowledge of let definition
{"$unwind": "$energy_carriers"},
{"$match": {"$expr": {"$eq": ["$$energy_carrier_id", "$energy_carriers._id"]}}}
],
"as": "energy_carrier" # already includes what we want but also extra fields
}
},
{"$addFields": { # overrides/corrects the result of the previous stage with parts of it
"energy_carrier": {"$first": "$energy_carrier.energy_carriers"}
}
}
]
results = tech.aggregate(pipeline)
B. Alternatives to MongoDb (e.g. RethinkDb) might be better suited for complex queries
Upvotes: 0
Reputation: 36114
You can try,
let
to pass energy_carrier
id to pipeline$match
to check expression condition using $in
operator, is energy_carrier
in energy_carriers._id
$project
to show required fields$filter
to iterate loop of energy_carriers
array and filter by variable
energy_carrier
passed in let
$first
to get the first element from above filtered result$addFields
and $first
to get the first element from the above lookup resultpipeline = [
{ $match: { name: "qux" } },
{
$lookup: {
from: "energy_carrier_groups",
let: { energy_carrier: "$energy_carrier" },
pipeline: [
{
$match: { $expr: { $in: ["$$energy_carrier", "$energy_carriers._id"] } }
},
{
$project: {
_id: 0,
energy_carriers: {
$first: {
$filter: {
input: "$energy_carriers",
cond: { $eq: ["$$energy_carrier", "$$this._id"] }
}
}
}
}
}
],
as: "energy_carrier"
}
},
{
$addFields: { energy_carrier: { $first: "$energy_carrier.energy_carriers" } }
}
]
results = tech.aggregate(pipeline)
Upvotes: 1