Dheemanth Bhat
Dheemanth Bhat

Reputation: 4452

Does multiple $project stages in MongoDB aggregation affect performance

TL;DR

We add $project stage in between $match and $lookup stage in order to filter out the unnecessary data or aliasing the fields.Those $project stages improve the read ability of the query while debugging but will they affect the performance in any way when there are large number of documents in every collection involved in the query.

Question Detailed

For example I have two collections schools and students as shown below:

Yes the schema design is bad i know! MongoDB says - put everything in same collection to avoid relations but lets continue with this approach for now.

schools collection

{
    "_id": ObjectId("5c04dca4289c601a393d9db8"),
    "name": "First School Name",
    "address": "1 xyz",
    "status": 1,
    // Many more fields
},
{
    "_id": ObjectId("5c04dca4289c601a393d9db9"),
    "name": "Second School Name",
    "address": "2 xyz",
    "status": 1,
    // Many more fields
},
// Many more Schools

student collection

{
    "_id": ObjectId("5c04dcd5289c601a393d9dbb"),
    "name": "One Student Name",
    "school_id": ObjectId("5c04dca4289c601a393d9db8"),
    "address": "1 abc",
    "Gender": "Male",
    // Many more fields
},
{
    "_id": ObjectId("5c04dcd5289c601a393d9dbc"),
    "name": "Second Student Name",
    "school_id": ObjectId("5c04dca4289c601a393d9db9"),
    "address": "1 abc",
    "Gender": "Male",
    // Many more fields
},
// Many more students

Now in my query as shown below i have a $project stage after $match just before $lookup. So is this $project stage necessary? Will this stage affect performance when there are huge number of documents in the all the collections involved in the query?

db.students.aggregate([
    {
        $match: {
            "Gender": "Male"
        }
    },
    // 1. Below $project stage is not necessary apart from filtering out and aliasing.
    // 2. Will this stage affect performance when there are huge number of documents?
    {
        $project: {
            "_id": 0,
            "student_id": "$_id",
            "student_name": "$name",
            "school_id": 1
        }
    },
    {
        $lookup: {
            from: "schools",
            let: {
                "school_id": "$school_id"
            },
            pipeline: [
                {
                    $match: {
                        "status": 1,
                        $expr: {
                            $eq: ["$_id", "$$school_id"]
                        }
                    }
                },
                {
                    $project: {
                        "_id": 0,
                        "name": 1
                    }
                }
            ],
            as: "school"
        }
    },
    {
        $unwind: "$school"
    }
]);

Upvotes: 0

Views: 2577

Answers (1)

snyderL
snyderL

Reputation: 111

Give this a read: https://docs.mongodb.com/v3.2/core/aggregation-pipeline-optimization/

Related to your particular case is The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.

So, there is some optimization going on behind the scenes. You might try tacking on the explain option to your aggregation to see exactly what mongo is doing to attempt to optimize your pipeline.

I think what you are doing should actually help performance as you are decreasing the amount of data flowing through.

Upvotes: 2

Related Questions