Reputation: 81
I have a collection called roles
and another collection called subjects
. Each subject has a field called role_id
, which contains the ID of that subject's role. I'm trying to count the number of subjects with each role using this query:
db.roles.aggregate([
{"$lookup" : {
"from": "subjects",
"localField": "_id",
"foreignField": "role_id",
"as": "subject_matches"
} },
{"$project": { "_id": 1, "total_matches": {"$size": "$subject_matches"} } },
{"$out": "testing"}
], {allowDiskUse: true} )
When I do this, I get the following error:
assert: command failed: {
"ok" : 0,
"errmsg" : "Total size of documents in subjects matching { role_id: { $eq: ObjectId('55421027b2fb3e916f0001e9') } } exceeds maximum document size",
"code" : 4568
} : aggregate failed
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:13:14
assert.commandWorked@src/mongo/shell/assert.js:267:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1312:5
@(shell):1:1
I'm not sure why I'm getting this -- according to Mongo documentation, the result size restrictions only apply to returned documents, and this size restriction can be exceeded during pipeline processing (https://docs.mongodb.com/manual/core/aggregation-pipeline-limits/). Since my returned documents only contain the the _id
and total_matches
fields, it seems like I shouldn't be getting this error.
When I perform this operation on a small subset of my collection, it works, with results looking like this:
`{ "_id" : ObjectId("55421026b2fb3e916f000001"), "total_matches" : 208 }
{ "_id" : ObjectId("55421026b2fb3e916f000002"), "total_matches" : 2 }
{ "_id" : ObjectId("55421026b2fb3e916f000003"), "total_matches" : 11 }
{ "_id" : ObjectId("55421026b2fb3e916f000004"), "total_matches" : 0 }
{ "_id" : ObjectId("55421026b2fb3e916f000005"), "total_matches" : 87 }`
Any thoughts on why this is ?
Upvotes: 2
Views: 8128
Reputation: 81
I ended up working around the problem by first getting the role_id counts from the subject collection into a new role_counts collection:
db.subjects.aggregate([
{"$group": {
"_id": "$role_id",
"count": {"$sum": 1},
}},
{"$out": "role_counts"}
])
And then doing the lookup from roles to role_counts:
db.roles.aggregate([
{"$lookup" : {
"from": "role_counts",
"localField": "_id",
"foreignField": "_id",
"as": "role_matches"
}
},
{"$unwind": {"path": "$role_matches", "preserveNullAndEmptyArrays": true}},
{"$project": {
"count": "$role_matches.count", "_id": 1
}}
])
This avoided the long arrays that were causing the document size limit issues.
Thanks all for your help!
Upvotes: 5
Reputation: 9473
That's it you need to decrease document(s) amount in $lookup.....
Upvotes: 1