Reputation: 4452
TL;DR
We add $project
stage in between $match
and $lookup
stage in order to filter out the unnecessary data or aliasing the fields.Those $project
stages improve the read ability of the query while debugging but will they affect the performance in any way when there are large number of documents in every collection involved in the query.
Question Detailed
For example I have two collections schools and students as shown below:
Yes the schema design is bad i know! MongoDB says - put everything in same collection to avoid relations but lets continue with this approach for now.
schools collection
{
"_id": ObjectId("5c04dca4289c601a393d9db8"),
"name": "First School Name",
"address": "1 xyz",
"status": 1,
// Many more fields
},
{
"_id": ObjectId("5c04dca4289c601a393d9db9"),
"name": "Second School Name",
"address": "2 xyz",
"status": 1,
// Many more fields
},
// Many more Schools
student collection
{
"_id": ObjectId("5c04dcd5289c601a393d9dbb"),
"name": "One Student Name",
"school_id": ObjectId("5c04dca4289c601a393d9db8"),
"address": "1 abc",
"Gender": "Male",
// Many more fields
},
{
"_id": ObjectId("5c04dcd5289c601a393d9dbc"),
"name": "Second Student Name",
"school_id": ObjectId("5c04dca4289c601a393d9db9"),
"address": "1 abc",
"Gender": "Male",
// Many more fields
},
// Many more students
Now in my query as shown below i have a $project
stage after $match
just before $lookup
.
So is this $project
stage necessary?
Will this stage affect performance when there are huge number of documents in the all the collections involved in the query?
db.students.aggregate([
{
$match: {
"Gender": "Male"
}
},
// 1. Below $project stage is not necessary apart from filtering out and aliasing.
// 2. Will this stage affect performance when there are huge number of documents?
{
$project: {
"_id": 0,
"student_id": "$_id",
"student_name": "$name",
"school_id": 1
}
},
{
$lookup: {
from: "schools",
let: {
"school_id": "$school_id"
},
pipeline: [
{
$match: {
"status": 1,
$expr: {
$eq: ["$_id", "$$school_id"]
}
}
},
{
$project: {
"_id": 0,
"name": 1
}
}
],
as: "school"
}
},
{
$unwind: "$school"
}
]);
Upvotes: 0
Views: 2577
Reputation: 111
Give this a read: https://docs.mongodb.com/v3.2/core/aggregation-pipeline-optimization/
Related to your particular case is
The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.
So, there is some optimization going on behind the scenes. You might try tacking on the explain option to your aggregation to see exactly what mongo is doing to attempt to optimize your pipeline.
I think what you are doing should actually help performance as you are decreasing the amount of data flowing through.
Upvotes: 2