Wonka
Wonka

Reputation: 8674

Cursor based pagination without offset?

For a large data set, offset based pagination becomes slow, and so a much quicker way is to use cursor based pagination. Basically, an anchor point, where the database knows to lookup results from that point onward. With that in mind, here is the issue I am facing:

I have a tabletv_watchers with an autoincrementing id, mins_watching_tv, and user_id (20 rows total fiddle below). In this example user_id will be the same 1, so no need to worry about it. We want to sort by the number of minutes spent watching tv, from highest to lowest.

This is easily done using this query:

SELECT * FROM tv_watchers
ORDER BY mins_watching_tv DESC, id ASC

This will return the correct order of the 20 fields ordered as we want it in this fashion by id:

2, 17, 1, 16, 15, 5, 6, 7, 8, 9, 10, 11, 12, 13, 20, 3, 4, 14, 19, 18

The issue is we want to split it up into chunks (we call them batches) of 5 since we want to return 5 results in the order above. We do this by retrieving the first 6 results, returning the first 5 to the user, and using the 6th if it exists as the cursor (anchor point) to get the next batch from: This returns the first batch correctly:

-- (Batch 1) 2, 17, 1, 16, 15, 5
SELECT * FROM tv_watchers
ORDER BY mins_watching_tv DESC, id ASC
LIMIT 6

The 6th item here is id 5 which has a mins_watching_tv of 60, so since this is the cursor we use it to get the next 6 like this:

-- (Batch 2) 5, 6, 7, 8, 9, 10
SELECT * FROM tv_watchers
WHERE mins_watching_tv <= 60 OR id=5
ORDER BY mins_watching_tv DESC, id ASC
LIMIT 6

The 6th item here is id 10 which also has a mins_watching_tv of 60, so since this is the cursor we use it to get the next 6 like this:

-- (Batch 3 should be) 10, 11, 12, 13, 20, 3
-- (Batch 3 returns incorrectly) 5, 6, 7, 8, 9, 10
SELECT * FROM tv_watchers
WHERE mins_watching_tv <= 60 OR id=10
ORDER BY mins_watching_tv DESC, id ASC
LIMIT 6

But the issue is the results that come back are not correct, it returns the incorrect batch 3 ids seen in the comment above. I am sure it has to do with the WHERE portion, it seems to pick up the mins_watching_tv <= 60 part but the id=10 part is there to let the database know to pick up results from that anchor point of 60 minutes and id 10, but that does not work correctly.

The final batch results should look like this:

-- (Batch 4) 3, 4, 14, 19, 18

I set up an sql fiddle here to show the issue. How can we fix the query so that it respects the cursor combination of mins_watching_tv in combination with the id to return the correct results in batches?

Upvotes: 5

Views: 2893

Answers (2)

sticky bit
sticky bit

Reputation: 37472

  1. Select your first 6 like you already did, without anything in the WHERE.

    SELECT *
           FROM tv_watchers
           ORDER BY mins_watching_tv DESC,
                    id ASC
           LIMIT 6;
    
  2. Duration @duration and the ID @id of the last row of the result from the previous step and put them into the WHERE like

    SELECT *
           FROM tv_watchers
           WHERE mins_watching_tv < @duration
                  OR mins_watching_tv = @duration
                     AND id >= @id
           ORDER BY mins_watching_tv DESC,
                    id ASC
           LIMIT 6;
    
  3. Repeat 2. until end is reached.

Explanation:

  • If mins_watching_tv < @duration we can be certain, that the respective row isn't in our previous result as mins_watching_tv is less then the minimum @duration from our previous result and we did a ORDER BY mins_watching_tv DESC.
  • If mins_watching_tv = @duration we don't know yet if we already had the row. But as we additionally did an ORDER BY id ASC, we know that all rows we already had with the same mins_watching_tv have an id less than or equal the current maximum @id (per mins_watching_tv). So we only want those rows where id > @id or, as we also want the last row of the previous result repeated, id = @id. In short that's id >= @id.

As we want the union of both of those sets, we have to disjunct the above predicates, so use an OR. We get (parenthesis only for clarity, they're not needed):

(mins_watching_tv < @duration)
 OR (mins_watching_tv = @duration
     AND id >= @id)

And here is the fiddle.

Upvotes: 4

Uueerdo
Uueerdo

Reputation: 15941

I only skimmed but I think you just need to tweak your conditions to (for example)

mins_watching_tv < 60 OR (mins_watching_tv = 60 AND id>=5)

Upvotes: 2

Related Questions