Kin Cheung
Kin Cheung

Reputation: 860

Use $lookup by referring to multiple values in the parent document

I was trying out the new $lookup pipeline feature available in MongoDb 3.5.8 and wondering how to refer fields in the parent document from the pipeline.

I was trying to compare two fields in the parent document against two fields in the looked up document . Not sure how to do this {$eq : ['$input_doc.field1', '$field1'] }.

 db.input_doc.aggregate([
        {
          $lookup:
            {
              from: "foreign_doc",
              pipeline: [
                            { $project: { 'matched': { $and : [ 
                                { $eq : ["$input_doc.field1", "$field1"] }, 
                                { $eq : ["$input_doc.field2", "$field2"] } 
                                          ]} },
                            { $match : { 'matched' : true } }
              ],
              as: "as_doc"
            }
       }
    ])

Thanks

Upvotes: 3

Views: 1715

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151230

It's not really clear what you mean here, you might possibly mean using let with the new pipeline option, but it's also likely you mean a completely different case.

The pipeline operation is generally intended for "non-correlated" data retrieval, which is useful in various use cases. This is as opposed to "correlated" data where the localField and foreignField can be applied to "join" between two collections.

As mentioned, this is covered in DOCS-10298

Best demonstrated as an example. Take creation of these collections:

db.related.insert([
  { "a": 1, "b": 2 },
  { "a": 2, "b": 2 },
  { "a": 3, "b": 3 }
])

db.parent.insert({
  "name": "test",
  "b": 2
})

I can use the pipeline and let statements here to test a logical condition on the items of the other collection like so:

db.parent.aggregate([
  { "$lookup": {
    "from": "related",
    "let": {
      "b": "$b"      
    },
    "pipeline": [
      { "$addFields": {
        "matched": { "$eq": [ "$$b", "$b" ] }
      }}
    ],
    "as": "results"
  }}
])

Which will give the result:

{
        "_id" : ObjectId("595332c28965d862ce61f451"),
        "name" : "test",
        "b" : 2,
        "results" : [
                {
                        "_id" : ObjectId("59532b028965d862ce61f44d"),
                        "a" : 1,
                        "b" : 2,
                        "matched" : true
                },
                {
                        "_id" : ObjectId("59532b028965d862ce61f44e"),
                        "a" : 2,
                        "b" : 2,
                        "matched" : true
                },
                {
                        "_id" : ObjectId("59532b028965d862ce61f44f"),
                        "a" : 3,
                        "b" : 3,
                        "matched" : false
                }
        ]
}

That shows the condition was tested against the declared variable in let from the parent document against those from the related collection tested in the pipeline supplied.

This allows you to also use a "logical" filter such as $redact:

db.parent.aggregate([
  { "$lookup": {
    "from": "related",
    "let": {
      "b": "$b"      
    },
    "pipeline": [
      { "$redact": {
        "$cond": {
          "if": { "$eq": [ "$$b", "$b" ] },
          "then": "$$KEEP",
          "else": "$$PRUNE"
        }
      }}
    ],
    "as": "results"
  }}
])

Which returns:

{
        "_id" : ObjectId("595332c28965d862ce61f451"),
        "name" : "test",
        "b" : 2,
        "results" : [
                {
                        "_id" : ObjectId("59532b028965d862ce61f44d"),
                        "a" : 1,
                        "b" : 2
                },
                {
                        "_id" : ObjectId("59532b028965d862ce61f44e"),
                        "a" : 2,
                        "b" : 2
                }
        ]
}

But of course, this is already covered with the existing functionality introduced in MongoDB 3.2 with the normal "correlated" options:

db.parent.aggregate([
  { "$lookup": {
    "from": "related",
    "localField": "b",
    "foreignField": "b",
    "as": "results"
  }}
])

With the same results as above.

Of course if you want "additional conditions", then it's actually most efficient to write using $unwind and $match:

db.parent.aggregate([
  { "$lookup": {
    "from": "related",
    "localField": "b",
    "foreignField": "b",
    "as": "results"
  }},
  { "$unwind": "$results" },
  { "$match": { "results.a": 1  } }
])

This is because the aggregation pipeline options for the following stages after $lookup are actually "hoisted" into the $lookup operation itself. Demonstrated in the "explain" output:

    {
            "$lookup" : {
                    "from" : "related",
                    "as" : "results",
                    "localField" : "b",
                    "foreignField" : "b",
                    "unwinding" : {
                            "preserveNullAndEmptyArrays" : false
                    },
                    "matching" : {
                            "a" : {
                                    "$eq" : 1
                            }
                    }
            }
    }

This shows the options of "unwinding" and "matching" have actually been applied within the $lookup. As yet you cannot just write that directly, but the pipeline combination applies this behavior.

This is in fact to essentially deal with the BSON Limit not being broken as a result of creating an array with entries that would exceed the 16MB cap.

In short, in most cases you generally want the existing behavior and do not need the new options. Yet.

Upvotes: 3

Related Questions