Lev Pevzner
Lev Pevzner

Reputation: 81

Why is my aggregation exceeding maximum document size?

I have a collection called roles and another collection called subjects. Each subject has a field called role_id, which contains the ID of that subject's role. I'm trying to count the number of subjects with each role using this query:

db.roles.aggregate([
    {"$lookup" : { 
        "from": "subjects", 
        "localField": "_id",
        "foreignField": "role_id",
        "as": "subject_matches"
    } },
    {"$project": { "_id": 1, "total_matches": {"$size": "$subject_matches"} } },
    {"$out": "testing"}
], {allowDiskUse: true} )

When I do this, I get the following error:

assert: command failed: {
    "ok" : 0,
    "errmsg" : "Total size of documents in subjects matching { role_id: { $eq: ObjectId('55421027b2fb3e916f0001e9') } } exceeds maximum document size",
    "code" : 4568
} : aggregate failed
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:13:14
assert.commandWorked@src/mongo/shell/assert.js:267:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1312:5
@(shell):1:1

I'm not sure why I'm getting this -- according to Mongo documentation, the result size restrictions only apply to returned documents, and this size restriction can be exceeded during pipeline processing (https://docs.mongodb.com/manual/core/aggregation-pipeline-limits/). Since my returned documents only contain the the _id and total_matches fields, it seems like I shouldn't be getting this error.

When I perform this operation on a small subset of my collection, it works, with results looking like this:

`{ "_id" : ObjectId("55421026b2fb3e916f000001"), "total_matches" : 208 }
{ "_id" : ObjectId("55421026b2fb3e916f000002"), "total_matches" : 2 }
{ "_id" : ObjectId("55421026b2fb3e916f000003"), "total_matches" : 11 }
{ "_id" : ObjectId("55421026b2fb3e916f000004"), "total_matches" : 0 }
{ "_id" : ObjectId("55421026b2fb3e916f000005"), "total_matches" : 87 }`

Any thoughts on why this is ?

Upvotes: 2

Views: 8128

Answers (2)

Lev Pevzner
Lev Pevzner

Reputation: 81

I ended up working around the problem by first getting the role_id counts from the subject collection into a new role_counts collection:

db.subjects.aggregate([
    {"$group": {
        "_id": "$role_id", 
        "count": {"$sum": 1},
    }},
    {"$out": "role_counts"}
])

And then doing the lookup from roles to role_counts:

db.roles.aggregate([    
    {"$lookup" : { 
        "from": "role_counts", 
        "localField": "_id",
        "foreignField": "_id",
        "as": "role_matches"
        }
    },
    {"$unwind": {"path": "$role_matches", "preserveNullAndEmptyArrays": true}},
    {"$project": {
        "count": "$role_matches.count", "_id": 1
    }}
])

This avoided the long arrays that were causing the document size limit issues.

Thanks all for your help!

Upvotes: 5

profesor79
profesor79

Reputation: 9473

Lookup returns an array that exceeds single document bson limit

That's it you need to decrease document(s) amount in $lookup.....

Upvotes: 1

Related Questions