gleb1783
gleb1783

Reputation: 461

MongoDB Aggregation Limit Lookup

I am using $lookup in PyMongo to successfully "join" two collections (this works). I am having a problem where the second collection I am joining in may exceed the BSON document size when it returns all of the records.

I am looking to use $limit to limit the number of records that are allowed to join under "match_docs" eg: 100 records maximum from "comments" per obj_id:

db.indicators.aggregate([
  {
    "$lookup": {
      "from": "comments",
      "localField": "_id",
      "foreignField": "obj_id",
      "as": "match_docs"
    }
  }
])

I've tried various types of $limit, and it seems to only limit the total number of results overall, not just for the join.

Upvotes: 15

Views: 23607

Answers (4)

Vipin Pandey
Vipin Pandey

Reputation: 907

This way we can filter the record by using limit aggregations. according to our requirement, we can pass the limit value. Here have a useful link Docs link

db.getCollection('botAnalytics').aggregate([{
                    $lookup: {
                        from: "movie",
                        localField: "botKey",
                        foreignField: "key",
                        as: "botDetails",
                    },
                },
                {
                    $match: { applicationId: "15077a8c38657a61b844e6a" },
                       },
                 { $limit : 5 }])

Upvotes: -3

Dan Karbayev
Dan Karbayev

Reputation: 2920

Starting from MongoDB 3.6 you can use uncorrelated subqueries to limit the lookup:

db.indicators.aggregate([
{ $lookup: {
  from: 'comments',
  as: 'match_docs',
  let: { indicator_id: '$_id' },
  pipeline: [
    { $match: {
      $expr: { $eq: [ '$obj_id', '$$indicator_id' ] }
    } },
    // { $sort: { createdAt: 1 } }, // add sort if needed (for example, if you want first 100 comments by creation date)
    { $limit: 100 }
  ]
} }
])

Upvotes: 35

gleb1783
gleb1783

Reputation: 461

I was able to figure it out.

$lookup -> $match -> $project

db.indicators.aggregate([{
    "$lookup": {
        "from": "comments"
        , "localField": "_id"
        , "foreignField": "obj_id"
        , "as": "match_docs"
    }
}, {
    "$match": {
        "match_docs": {
            "$exists": True
        }
    }
}, {
    "$project": {
        "match_docs_agg": {
            "$slice": ["$match_docs", 3]
        }
    }
}])

Upvotes: 2

Pete Garafano
Pete Garafano

Reputation: 4913

If you do a $unwind immediately following a $lookup, the pipeline will be optimized, basically combining the 2 stages helping to bypass the 16MB limit that could result from the $lookup returning a large number of documents.

Keep in mind, if a single document in the foreign collection plus the size of the document in the local collection exceed 16 MB, this optimization cannot help.

Upvotes: 3

Related Questions