Reputation: 11337
MongoDB 4.0.
This is the data set (sales-aggregate-test.js
):
use Test123;
const HOW_MANY_PRODUCTS = 1000
const HOW_MANY_SALES_PER_PRODUCT = 50
for(let i = 0; i < HOW_MANY_PRODUCTS; i++) {
const productNumber = (i + 10001)
const productId = '5bd9d139d96b8fce000' + productNumber
db.getCollection('products').insert({
_id: ObjectId(productId),
title: 'Product ' + productNumber,
})
for(let j = 0; j < HOW_MANY_SALES_PER_PRODUCT; j++) {
const saleNumber = (j + 10001)
const saleId = '5bd9d139d96b8f' + productNumber + saleNumber
db.getCollection('sales').insert({
_id: ObjectId(saleId),
product: ObjectId(productId),
quantity: i + j + 1,
})
}
}
Insert it with: mongo < ./sales-aggregate-test.js
.
Now this is the query (sales-aggregate-test-actual-query.js
):
use Test123;
db.getCollection('sales').aggregate(
[
{
$sort: { product: 1, remoteVariantId: 1, quantity: -1, }
},
{
$lookup: {
from: 'products',
localField: 'product',
foreignField: '_id',
as: 'productModel',
}
},
{
$unwind: '$productModel'
},
{
$match: {
'productModel.archived': { $ne: true }
}
},
{
$project: {
product: 1,
quantity: 1,
}
},
//{ $limit: 10 },
{
$group: {
_id: '$product',
saleModelsCount: { $sum: 1 },
quantity : { $sum: '$quantity' },
}
},
{
$sort: { quantity: -1, }
},
{ $limit: 3 },
]
// ,{ allowDiskUse: true }
)
What am trying to achieve? Getting this faster:
{ "_id" : ObjectId("5bd9d139d96b8fce00011000"), "saleModelsCount" : 50, "quantity" : 51225 }
{ "_id" : ObjectId("5bd9d139d96b8fce00010999"), "saleModelsCount" : 50, "quantity" : 51175 }
{ "_id" : ObjectId("5bd9d139d96b8fce00010998"), "saleModelsCount" : 50, "quantity" : 51125 }
This is basically: Give me the best selling product. Since sales include quantity, I need to first group them by quantity and then sort.
Now on this test data set it's "fast" - just 2.5 seconds. The problem is with a real data set, where the product models are much bigger, and more factors involved (like a 'price' field in a sale model).
The issue seems to be caused by both the last $group
and $sort
stages. Commenting out both returns quickly. Commenting out just one makes the query slow.
How do I make it faster? Open for suggestions - a different approach is also possible.
Upvotes: 1
Views: 48
Reputation: 49985
Few thoughts that might be useful for you:
First of all you can get rid of first $sort
as you have another one in the last pipeline stage and that one will guarantee right order.
There are few ways how to replace $lookup
+ $unwind
+ $match
+ $project
+ $group
.
You can use $addFields with $filter to filter out some elements before you $unwind
:
{
$lookup: {
from: 'products',
localField: 'product',
foreignField: '_id',
as: 'productModel',
}
},
{
$addFields: {
productModel: {
$filter: {
input: '$productModel',
as: 'model',
cond: { $ne: [ '$$model.archived', true ] }
}
}
}
},
{
$unwind: '$productModel'
}
In this case you can remove $match
since this operation is performed in nested array.
Second way might be to use $lookup with custom pipeline, so that you can perform this additional filtering inside $lookup
:
{
$lookup: {
from: 'products',
let: { productId: "$product" },
pipeline: [
{
$match: { $expr: { $and: [ { $eq: [ "$$productId", "$_id" ] }, { $ne: [ "$archived", true ] } ] } }
}
],
as: 'productModel',
}
}
As another optimization in both cases you don't need $unwind
as your productModel
array is filtered and then you can just modify your $group
:
{
$group: {
_id: '$product',
saleModelsCount: { $sum: { $size: "$productModel" } },
quantity : { $sum: '$quantity' },
}
}
Upvotes: 1