Reputation: 517
I have two collections:
My goal is try to adjust index or change the query pipeline to optimize the query to answer how many active
users with specific tag tag1
and tagged for at least 2
times during the date
range, my mongo pipeline will be:
[
1 { $match : {
date: { $gte: ..., $lte: ... },
name: 'tag1'
} },
2 { $group: {
_id: '$cid',
totalCount: '$count'
} },
3 { $match: {
totalCount: { $gte: 2 }
} },
4 { $lookup: {
from: 'Customer',
pipeline: [
{ $match: {
"$expr": {
"$and": [
{ "$eq": [ "$_id", "$$cid" ] },
{ "$eq": [ "$state", "active" ] },
]
}
} }
],
as: 'Customer'
} },
5 { "$unwind": "$Customer" },
6 { "$count": "count" }
]
I have tried add index with
{
date : -1.0,
name : 1.0,
cid : 1.0
}
The following is the explain("executionStats")
result without $lookup (stage 4 and 5). The $match
takes only 802 ms, but the $group
part takes the majority of the time (16610 - 5598 = 11012 ms).
[
{
$cursor: {
executionStats: {
executionSuccess: true,
nReturned: 2632494,
executionTimeMillis: 17937,
totalKeysExamined: 2632494,
totalDocsExamined: 2632494,
executionStages: {
stage: 'PROJECTION_SIMPLE',
nReturned: 2632494,
executionTimeMillisEstimate: 802,
inputStage: {
stage: 'FETCH',
nReturned: 2632494,
executionTimeMillisEstimate: 636,
docsExamined: 2632494,
inputStage: {
stage: 'IXSCAN',
nReturned: 2632494,
executionTimeMillisEstimate: 376,
indexName: 'date_1_name_1_cid_1',
direction: 'forward',
indexBounds: {
date: ['[20230101.0, 20231231.0]'],
name: ['["tag1"]'],
cid: ['[MinKey, MaxKey]'],
},
keysExamined: 2632494,
seeks: 1,
},
},
},
},
},
nReturned: NumberLong(2632494),
executionTimeMillisEstimate: NumberLong(5598),
},
{
$group: {
_id: '$cid',
totalCount: { $sum: '$count' },
},
nReturned: NumberLong(2632494),
executionTimeMillisEstimate: NumberLong(16610),
},
{
$match: {
totalCount: { $gte: 2.0 },
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(17537),
},
{
$group: {
_id: { $const: null },
count: { $sum: { $const: 1 } }
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(17933),
},
{
$project: {
count: true,
_id: false
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(17933),
},
];
I also tried to add count
to the last position of the compound index. The result is only a little better. One can find that the stage PROJECTION_SIMPLE
has changed to PROJECTION_COVERED
, the FETCH
stage is gone, and the totalDocsExamined
field shows 0. That is great in that the new index covered all the date required before going into the $group
stage. However, group still take the majority of the time (15033 - 4239 = 10794 ms)
[
{
$cursor: {
executionStats: {
executionTimeMillis: 16285,
totalKeysExamined : 2632494,
totalDocsExamined : 0,
executionStages: {
stage: 'PROJECTION_COVERED',
executionTimeMillisEstimate: 479,
inputStage: {
stage: 'IXSCAN',
executionTimeMillisEstimate: 370,
indexName: 'date_1_name_1_cid_1_count_1',
indexBounds: {
date: ['[20230101.0, 20231231.0]'],
name: ['["tag1"]'],
cid: ['[MinKey, MaxKey]'],
count: ['[MinKey, MaxKey]'],
},
keysExamined: 2632494,
seeks: 1,
},
},
},
},
nReturned: NumberLong(2632494),
executionTimeMillisEstimate: NumberLong(4239),
},
{
$group: {
_id: '$cid',
totalCount: { $sum: '$count' },
},
nReturned: NumberLong(2632494),
executionTimeMillisEstimate: NumberLong(15033),
},
{
$match: {
totalCount: { $gte: 2.0 },
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(15863),
},
{
$group: {
_id: { $const: null },
count: { $sum: { $const: 1 } }
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(16285),
},
{
$project: {
count: true,
_id: false
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(16285),
},
]
Is there any other way to improve the time?
This part drives my CRAZY. I have added the following index on Customer collection and hope that the lookup can fully utilize index.
{
_id: 1.0,
state: 1.0,
}
When $lookup added, it takes additional 766,596 - 14,391 = 752,205 ms to complete (OMG...). In theory, since I have compound index in Customer with all required field for lookup match, the lookup should be completed by IXSCAN. And because it requires no addition field to unwind and count, I think it is possible to complete the query without fetching any documents. Since I found no way to let explain to show what happened on $lookup stage. I do not know where I can improve.
[
...
{
$match: {
totalDensity: { $gte: 1.0 },
},
nReturned: NumberLong(2632494),
executionTimeMillisEstimate: NumberLong(14391),
},
{
$lookup: { ... },
nReturned: NumberLong(2313852),
executionTimeMillisEstimate: NumberLong(766596),
},
{
$group: {
_id: { $const: null },
count: { $sum: { $const: 1 } },
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(766596),
},
{
$project: {
count: true,
_id: false,
},
nReturned: NumberLong(1),
executionTimeMillisEstimate: NumberLong(766596),
}
]
Upvotes: 1
Views: 59
Reputation: 59456
You don't need { "$unwind": "$Customer" }
. This one should be faster:
{ $project: { size: { $size: "$Customer" } } },
{ $group: { _id: null, count: { $sum: "$size" } } }
Upvotes: 1