user1817188
user1817188

Reputation: 517

Optimize query and index for mongo $group, $lookup and $count query

I have two collections:

My goal is try to adjust index or change the query pipeline to optimize the query to answer how many active users with specific tag tag1 and tagged for at least 2 times during the date range, my mongo pipeline will be:

[
1   { $match : {
        date: { $gte: ..., $lte: ... },
        name: 'tag1'
    } },
2   { $group: {
        _id: '$cid',
        totalCount: '$count'
    } },
3   { $match: {
        totalCount: { $gte: 2 }
    } },
4   { $lookup: {
        from: 'Customer',
        pipeline: [
            { $match: {
                "$expr": {
                    "$and": [
                        { "$eq": [ "$_id", "$$cid" ] },
                        { "$eq": [ "$state", "active" ] },
                    ]
                }
            } }
        ],
        as: 'Customer'
    } },
5   { "$unwind": "$Customer" },
6   { "$count": "count" }
]

For the first part ( stage 1~3 )

I have tried add index with

{
    date : -1.0,
    name : 1.0,
    cid : 1.0
}

The following is the explain("executionStats") result without $lookup (stage 4 and 5). The $match takes only 802 ms, but the $group part takes the majority of the time (16610 - 5598 = 11012 ms).

[
    {
        $cursor: {
            executionStats: {
                executionSuccess: true,
                nReturned: 2632494,
                executionTimeMillis: 17937,
                totalKeysExamined: 2632494,
                totalDocsExamined: 2632494,
                executionStages: {
                    stage: 'PROJECTION_SIMPLE',
                    nReturned: 2632494,
                    executionTimeMillisEstimate: 802,
                    inputStage: {
                        stage: 'FETCH',
                        nReturned: 2632494,
                        executionTimeMillisEstimate: 636,
                        docsExamined: 2632494,
                        inputStage: {
                            stage: 'IXSCAN',
                            nReturned: 2632494,
                            executionTimeMillisEstimate: 376,
                            indexName: 'date_1_name_1_cid_1',
                            direction: 'forward',
                            indexBounds: {
                                date: ['[20230101.0, 20231231.0]'],
                                name: ['["tag1"]'],
                                cid: ['[MinKey, MaxKey]'],
                            },
                            keysExamined: 2632494,
                            seeks: 1,
                        },
                    },
                },
            },
        },
        nReturned: NumberLong(2632494),
        executionTimeMillisEstimate: NumberLong(5598),
    },
    {
        $group: {
            _id: '$cid',
            totalCount: { $sum: '$count' },
        },
        nReturned: NumberLong(2632494),
        executionTimeMillisEstimate: NumberLong(16610),
    },
    {
        $match: {
            totalCount: { $gte: 2.0 },
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(17537),
    },
    {
        $group: {
            _id: { $const: null },
            count: { $sum: { $const: 1 } }
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(17933),
    },
    {
        $project: { 
            count: true, 
            _id: false
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(17933),
    },
];

I also tried to add count to the last position of the compound index. The result is only a little better. One can find that the stage PROJECTION_SIMPLE has changed to PROJECTION_COVERED, the FETCH stage is gone, and the totalDocsExamined field shows 0. That is great in that the new index covered all the date required before going into the $group stage. However, group still take the majority of the time (15033 - 4239 = 10794 ms)

[
    {
        $cursor: {
            executionStats: {
                executionTimeMillis: 16285,
                totalKeysExamined : 2632494,
                totalDocsExamined : 0,
                executionStages: {
                    stage: 'PROJECTION_COVERED',
                    executionTimeMillisEstimate: 479,
                    inputStage: {
                        stage: 'IXSCAN',
                        executionTimeMillisEstimate: 370,
                        indexName: 'date_1_name_1_cid_1_count_1',
                        indexBounds: {
                            date: ['[20230101.0, 20231231.0]'],
                            name: ['["tag1"]'],
                            cid: ['[MinKey, MaxKey]'],
                            count: ['[MinKey, MaxKey]'],
                        },
                        keysExamined: 2632494,
                        seeks: 1,
                    },
                },
            },
        },
        nReturned: NumberLong(2632494),
        executionTimeMillisEstimate: NumberLong(4239),
    },
    {
        $group: {
            _id: '$cid',
            totalCount: { $sum: '$count' },
        },
        nReturned: NumberLong(2632494),
        executionTimeMillisEstimate: NumberLong(15033),
    },
    {
        $match: {
            totalCount: { $gte: 2.0 },
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(15863),
    },
    {
        $group: {
            _id: { $const: null },
            count: { $sum: { $const: 1 } }
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(16285),
    },
    {
        $project: { 
            count: true, 
            _id: false
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(16285),
    },
]

Is there any other way to improve the time?

For the second part ( all stage )

This part drives my CRAZY. I have added the following index on Customer collection and hope that the lookup can fully utilize index.

{
    _id: 1.0,
    state: 1.0,
}

When $lookup added, it takes additional 766,596 - 14,391 = 752,205 ms to complete (OMG...). In theory, since I have compound index in Customer with all required field for lookup match, the lookup should be completed by IXSCAN. And because it requires no addition field to unwind and count, I think it is possible to complete the query without fetching any documents. Since I found no way to let explain to show what happened on $lookup stage. I do not know where I can improve.

[
    ...
    {
        $match: {
            totalDensity: { $gte: 1.0 },
        },
        nReturned: NumberLong(2632494),
        executionTimeMillisEstimate: NumberLong(14391),
    },
    {
        $lookup: { ... },
        nReturned: NumberLong(2313852),
        executionTimeMillisEstimate: NumberLong(766596),
    },
    {
        $group: {
            _id: { $const: null },
            count: { $sum: { $const: 1 } },
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(766596),
    },
    {
        $project: {
            count: true,
            _id: false,
        },
        nReturned: NumberLong(1),
        executionTimeMillisEstimate: NumberLong(766596),
    }
]

Upvotes: 1

Views: 59

Answers (1)

Wernfried Domscheit
Wernfried Domscheit

Reputation: 59456

You don't need { "$unwind": "$Customer" }. This one should be faster:

{ $project: { size: { $size: "$Customer" } } },
{ $group: { _id: null, count: { $sum: "$size" } } }

Upvotes: 1

Related Questions