Muhammad Hashir Anwaar
Muhammad Hashir Anwaar

Reputation: 614

$SKIP stage contain all previous records

I have a collection of 1.2M documents and I am making a list of them using pagination (25 per page). For that, I am using Aggregation Framework with $sort->$skip->$limit->$project stages respectively. Things are working fine but a wierd behaviour of $skip causing nReturned docs a high value as i keep exploring pages.

For example, assume $limit => 25, Page 1 => nReturned 25, Skip 0, Display 1-25, on Page 2 => nReturned 50, Skip 25, Display 26-50 , on Page 3 => nReturned 75, Skip 50, Display 51-75 and so on... and on Page LAST => nReturned SOME_MILLION, Skip MILLION, Display LAST_MILLIONS

If, I visit last pages, nReturned, totalDocsExamined and totalKeysExamined will be near to 1.2M as i have a lot of record.

Query

db.collection.aggregate([
{
    $sort : {
        date_added : -1
    }
},
{
    $skip : 50  
},
{
    $limit : 25
},
{
    $project : {
        luid : 1,
        name : 1
    }
}
])

Am i doing something wrong or is there any way to optimise this query?

Upvotes: 0

Views: 71

Answers (1)

Someone Special
Someone Special

Reputation: 13588

Using $skip is very memory inefficient, imaging if you are going to the last page, you will be skipping 1.2M records.

Will you really be displaying page 10000 under your pagination or will there be < and > arrows?

A better way would be using the last key, for example records of the following.

{ 
   _id: 1
   _data: ...
},
{   
   _id: 2
   data: ...
}....

At the first page, my last key will be 25, second page, my last key will be 50, or maybe e.g. 62, if my keys are not in order (deleted in between).

My pagination will then be as simple as

Model.find({ _id: { $gt: last }}).limit(25).sort({ _id: 1 })

Where last is the last id retrieved in page 1.

calling for the previous page is as simple as switching $gt to $lt and reverse the sorting

Tip: in most cases nobody will go select page e.g 100204, we will usually have a search function if we do have this many records.

If we really need to use $skip, a caching layer should be in order to cache the records retrieved (e.g. using redis) to prevent repeatedly calling the same query.

Upvotes: 1

Related Questions