Poni
Poni

Reputation: 11337

mongodb aggregate sort by groupped fields

MongoDB 4.0.

This is the data set (sales-aggregate-test.js):

use Test123;

const HOW_MANY_PRODUCTS = 1000
const HOW_MANY_SALES_PER_PRODUCT = 50

for(let i = 0; i < HOW_MANY_PRODUCTS; i++) {
  const productNumber = (i + 10001)
  const productId = '5bd9d139d96b8fce000' + productNumber
  db.getCollection('products').insert({
    _id: ObjectId(productId),
    title: 'Product ' + productNumber,
  })

  for(let j = 0; j < HOW_MANY_SALES_PER_PRODUCT; j++) {
    const saleNumber = (j + 10001)
    const saleId = '5bd9d139d96b8f' + productNumber + saleNumber
    db.getCollection('sales').insert({
      _id: ObjectId(saleId),
      product: ObjectId(productId),
      quantity: i + j + 1,
    })
  }
}

Insert it with: mongo < ./sales-aggregate-test.js.

Now this is the query (sales-aggregate-test-actual-query.js):

use Test123;

db.getCollection('sales').aggregate(
  [
    {
      $sort: { product: 1, remoteVariantId: 1, quantity: -1, }
    },

    {
      $lookup: {
        from:               'products',
        localField:         'product',
        foreignField:        '_id',
        as:                 'productModel',
      }
    },

    {
      $unwind: '$productModel'
    },

    {
      $match: {
        'productModel.archived': { $ne: true }
      }
    },

    {
      $project: {
        product: 1,
        quantity: 1,
      }
    },

    //{ $limit: 10 },

    {
      $group: {
        _id: '$product',

        saleModelsCount: { $sum: 1 },

        quantity : { $sum: '$quantity' },
      }
    },

    {
      $sort: { quantity: -1, }
    },
    { $limit: 3  },
  ]
  // ,{ allowDiskUse: true }
)

What am trying to achieve? Getting this faster:

{ "_id" : ObjectId("5bd9d139d96b8fce00011000"), "saleModelsCount" : 50, "quantity" : 51225 }
{ "_id" : ObjectId("5bd9d139d96b8fce00010999"), "saleModelsCount" : 50, "quantity" : 51175 }
{ "_id" : ObjectId("5bd9d139d96b8fce00010998"), "saleModelsCount" : 50, "quantity" : 51125 }

This is basically: Give me the best selling product. Since sales include quantity, I need to first group them by quantity and then sort.

Now on this test data set it's "fast" - just 2.5 seconds. The problem is with a real data set, where the product models are much bigger, and more factors involved (like a 'price' field in a sale model).

The issue seems to be caused by both the last $group and $sort stages. Commenting out both returns quickly. Commenting out just one makes the query slow.

How do I make it faster? Open for suggestions - a different approach is also possible.

Upvotes: 1

Views: 48

Answers (1)

mickl
mickl

Reputation: 49985

Few thoughts that might be useful for you:

First of all you can get rid of first $sort as you have another one in the last pipeline stage and that one will guarantee right order.

There are few ways how to replace $lookup + $unwind + $match + $project + $group.

You can use $addFields with $filter to filter out some elements before you $unwind:

{
    $lookup: {
        from: 'products',
        localField: 'product',
        foreignField: '_id',
        as: 'productModel',
    }
},

{
    $addFields: {
        productModel: {
            $filter: {
                input: '$productModel',
                as: 'model',
                cond: { $ne: [ '$$model.archived', true ] }
            }
        }
    }
},

{
    $unwind: '$productModel'
}

In this case you can remove $match since this operation is performed in nested array.

Second way might be to use $lookup with custom pipeline, so that you can perform this additional filtering inside $lookup:

{
    $lookup: {
        from:   'products',
        let: { productId: "$product" },
        pipeline: [
            {
                $match: { $expr: { $and: [ { $eq: [ "$$productId", "$_id" ] }, { $ne: [ "$archived", true ] } ] } }
            }
        ],
        as: 'productModel',
    }
}

As another optimization in both cases you don't need $unwind as your productModel array is filtered and then you can just modify your $group:

{
    $group: {
        _id: '$product',
        saleModelsCount: { $sum: { $size: "$productModel" } },
        quantity : { $sum: '$quantity' },
    }
}

Upvotes: 1

Related Questions