Stefan
Stefan

Reputation: 12260

How to use $lookup to resolve reference to nested document?

I have energy_carrier documents that are nested in a collection energy_ carrier_groups. I reference those energy_carrier documents from another collection tech and would like to resolve the reference with a $lookup aggregation.

=> How can I define a sub query inside $lookup that preprocesses /unwinds the energy carriers before I do the actual join/lookup?

My preferred way would be to specify a path for the from or foreignField option to target the nested documents of the energy_carrier_groups collection:

"from": "energy_carrier_groups.energy_carriers"

or

"from": "energy_carrier_groups"
"foreignField": "energy_carriers._id". 

However, that does not seem to work.

I found that $lookup supports let and pipeline arguments as an alternative to the options localField and foreignField (since Version 3.6) and that might be the way to go.

https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#join-conditions-and-subqueries-on-a-joined-collection

Since version 5.0 its also possible to combine all four options:

https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#correlated-subqueries-using-concise-syntax

(A requirement for mongodb version 5.0 is a CPU that supports AVX and mine doesn't.)

All that options make knots in my head! Could you please tell me how I should formulate the pipeline to resolve my reference from tech to energy carrier? Can't be that difficult, could it?

Code to create some example data:

import bson
from pymongo import MongoClient


def unique_id():
    return bson.objectid.ObjectId()


client = MongoClient(host='localhost', port=27017)
database = client.forecast

referenced_id = unique_id()

# create referenced collection
energy_carrier_groups = database.energy_carrier_groups
energy_carrier_groups.delete_many({})
energy_carrier_groups.insert_many([
    {
        '_id': unique_id(),
        'name': 'fuels',
        'energy_carriers': [
            {
                '_id': referenced_id,
                'name': 'oil'
            },
            {
                '_id': unique_id(),
                'name': 'gas'
            }
        ]
    },
    {
        '_id': unique_id(),
        'name': 'electricity',
        'energy_carriers': [
            {
                '_id': unique_id(),
                'name': 'green electricity'
            },
            {
                '_id': unique_id(),
                'name': 'conventional electricity'
            }
        ]
    },

])

# create referencing collection
tech = database.tech
tech.delete_many({})
tech.insert_many([
    {
        '_id': unique_id(),
        'name': 'qux',
        'energy_carrier': referenced_id
    },

])

Expected result of my aggregation:

{
    '_id': ObjectId('6183de1b5dd889cfcdeaa711'), 
    'name': 'qux', 
    'energy_carrier': {
        '_id': ObjectId('6183de1b5dd889cfcdeaa70b'), 
        'name': 'oil'
    }
}

First trial, using a path to the nested document:

pipeline = [
    {"$match": {"name": 'qux'}},
    {"$lookup": {
       "from": "$energy_carrier_groups.energy_carriers", # <= does not work 
       "localField": "energy_carrier",
       "foreignField": "_id",
       "as": "energy_carrier"
      }
    },
    {"$unwind": "$energy_carrier"},  # transforms lookup result array to a single entry
]
results = referencing.aggregate(pipeline)

for result in results:
    print(result)

print('finished')

Another trial, using let and pipeline instead of localField and foreignField:

pipeline = [
    {"$match": {"name": 'qux'}},
    {"$lookup": {
       "from": "energy_carrier_groups",
       "let": {"tech_energy_carrier_id": "$energy_carrier"},
       "pipeline": [
           {"$unwind": "$energy_carriers"},
           {"$match": {"$expr": {"$eq": ["$$tech_energy_carrier_id", "$energy_carriers._id"]}}}
       ],
       "as": "energy_carrier"  # overrides id field with an array wrapping the resolved reference
      }
    },
    {"$unwind": "$energy_carrier"},  # transforms array to a single entry
]
results = tech.aggregate(pipeline)

for result in results:
    print(result)

print('finished')

Gives some result but resolves the reference with "filtered energy carrier groups" instead of resolving only the energy carrier.

=> What is the recommended way to resolve the referenced energy carrier for tech?

=> If there is a better No-Sql Database then MongoDb for this porpose, please let me know, too

Related:

https://softwarerecs.stackexchange.com/questions/81175/is-there-an-alternative-to-mongodb-that-allows-to-easily-resolve-document-refere

MongoDB $lookup on nested document

$lookup on ObjectId's in an array

https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#join-conditions-and-subqueries-on-a-joined-collection

https://www.stackchief.com/tutorials/%24lookup%20Examples%20%7C%20MongoDB

Upvotes: 1

Views: 619

Answers (2)

Stefan
Stefan

Reputation: 12260

A. Here is another version, based on my initial trial to use lookup and combined with the addFields trick of turivishal to override/correct the resulting property energy_carrier.

pipeline = [
    {"$match": {"name": 'qux'}},
    {"$lookup": {
       "from": "energy_carrier_groups",
       "let": {"energy_carrier_id": "$energy_carrier"},  # executed on tech
       "pipeline": [  # executed on energy_carrier_groups, with the knowledge of let definition
           {"$unwind": "$energy_carriers"},
           {"$match": {"$expr": {"$eq": ["$$energy_carrier_id", "$energy_carriers._id"]}}}
       ],
       "as": "energy_carrier"  # already includes what we want but also extra fields
      }
    },
    {"$addFields": {  # overrides/corrects the result of the previous stage with parts of it
        "energy_carrier": {"$first": "$energy_carrier.energy_carriers"}
    }
  }
]
results = tech.aggregate(pipeline)

B. Alternatives to MongoDb (e.g. RethinkDb) might be better suited for complex queries

Also see https://softwarerecs.stackexchange.com/questions/81175/is-there-an-alternative-to-mongodb-that-allows-to-easily-resolve-document-refere

Upvotes: 0

turivishal
turivishal

Reputation: 36114

You can try,

  • let to pass energy_carrier id to pipeline
  • $match to check expression condition using $in operator, is energy_carrier in energy_carriers._id
  • $project to show required fields
  • $filter to iterate loop of energy_carriers array and filter by variable energy_carrier passed in let
  • $first to get the first element from above filtered result
  • $addFields and $first to get the first element from the above lookup result
pipeline = [
  { $match: { name: "qux" } },
  {
    $lookup: {
      from: "energy_carrier_groups",
      let: { energy_carrier: "$energy_carrier" },
      pipeline: [
        {
          $match: { $expr: { $in: ["$$energy_carrier", "$energy_carriers._id"] } }
        },
        {
          $project: {
            _id: 0,
            energy_carriers: {
              $first: {
                $filter: {
                  input: "$energy_carriers",
                  cond: { $eq: ["$$energy_carrier", "$$this._id"] }
                }
              }
            }
          }
        }
      ],
      as: "energy_carrier"
    }
  },
  {
    $addFields: { energy_carrier: { $first: "$energy_carrier.energy_carriers" } }
  }
]

results = tech.aggregate(pipeline)

Playground

Upvotes: 1

Related Questions