Aniket Jha
Aniket Jha

Reputation: 1781

Order of $sort and $limit in mongo db aggregation

Does the order of $skip and $limit matter in the pipeline?

I use mongoose aggregation which runs a pipeline of operations. With skip and limit appended at the end.

   projectModel.aggregate(pipeline)
  .sort({ updatedAt: -1 })
  .skip(skip)
  .limit(limit)

My pipeline simply looks like

$match(userId) > $lookup(html_collection) > $lookup(records_collection) > $sort(on updatedAt from mongoose) > $skip(mongoose) > $limit(mongoose)

What I observed during pagination is that the $limit is being respected but the $skip is happening only at end of the pipeline. For example:

Page1: skip = 0, limit = 10

The number of documents as per the .explain() clearing $match stage is 10.

Page 2: skip = 10, limit = 10

The number of documents clearing $match stage is 20 (skip + limit), and 20 documents made it to the next stage. $lookup is made on 20 documents. Slowing down our pipeline and it is at the last stage when $skip works to discard the first 10 documents. Here we wasted work on the first 10 documents.

This was causing an issue in our pipeline and it was getting slow for pagination.

SOLUTION: What we ended up doing was to move $limit followed by $skip after $match. then $limit = skip + limit, $skip = skip. We thought that limiting documents as limit = skip + limit will fetch documents and $skip at the next stage would reject unnecessary documents, thereby giving $lookup stages only the expected documents.

Page 1: skip = 0, limit = 10 $limit = 0 + 10 = 10 followed by $skip = 0

Page 2: skip = 10, limit = 10 $limit = 10 + 10 followed by $skip = 10

our pipeline now looks like:

$match(userId) > $sort(updatedAt) > $limit(limit + skip) > $skip (skip) > $lookup(html_collection) > $lookup(records_collection)

Here is the sample collection scehema for your refernce:

    PROJECT COLLECTION     
    {
     id: ObjectId,
     records: [ObjectId],
     userId: ObjectId
    }
     
    RECORDS COLLECTION
     
    {
     id: ObjectId,
     text: string
    }
     
    HTML COLLECTION
     
    {
     id: ObjectId,
     html: string,
     projectId: ObjectId
    }

QUESTIONS:

  1. Is this behavior intended or something is wrong here with the $skip and $limit?
  2. Is the solution that we came up with is correct? and will it scale?, as I think for last page there are too many documents clearing the $match stage, but that is also something that MongoDB internally did right...as seen on page 2 of in our case?

Upvotes: 0

Views: 1355

Answers (1)

Branchverse
Branchverse

Reputation: 1397

Yes your solution is the way to go:

  • $skip only ignores the set amount given to it by the pipeline
  • $limit actually only returns the told amount.

So the chaining you did translates into this sentence:

We limit at 20 while skipping the first 10 results in getting the 2nd set of 10 documents.

If you say u limit at 10 and then ignore 10 via skip you have no documents left.

For your 2nd question, yes it will scale, just go ahead and have a format like this:

const baseLimit = 10
const skip = 10
const limit = baseLimit + sort

Upvotes: 1

Related Questions