light24bulbs
light24bulbs

Reputation: 3151

SQL pagination based on last record retrieved

I need to implement pagination which is semi-resilient to data changing between paginations. The standard pagination relies on SQL's LIMIT and OFFSET, however offset has potential to become inaccurate as new data points are created or their ranking shifts in the sort.

One idea is to hold onto the last data point requested from the API and get the following elements. I don't really know SQL (we're using postgres), but this is my (certainly flawed) attempt at doing something like that. I am trying to store the position of the last element as 'rownum' and then use it in the following query.

WITH rownum AS (
 SELECT *, ROW_NUMBER() OVER (ORDER BY rank ASC, id) AS rownum 
 WHERE id = #{after_id}
 FROM items )
SELECT * FROM items
OFFSET rownum
ORDER BY rank ASC, id
LIMIT #{pagination_limit}

I can see some issues with this, like if the last item changes significantly in rank. If anyone can think of another way to do this, that would be great. But I would like to confine it to a single DB query if possible since this is the applications most frequently hit API.

Upvotes: 2

Views: 2762

Answers (2)

light24bulbs
light24bulbs

Reputation: 3151

I now think the best way to solve this problem is by storing the datetime of the original query and filtering out results after that moment on subsequent queries, thus ensuring the offset is mostly correct. Maybe a persistent database could be used to ensure that the data is at the same state it was when the original query was made.

Upvotes: 1

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656814

Your whole syntax doesn't quite work. OFFSET comes after ORDER BY. FROM comes before WHERE etc.

This simpler query would do what I think your code is supposed to do:

SELECT *
FROM   items
WHERE (rank, id) > (
   SELECT (rank, id)
   FROM   items
   WHERE  id = #{after_id}
   )
ORDER  BY rank, id
LIMIT  #{pagination_limit};

Comparing the composite type (rank, id) guarantees identical sort order.

Make sure you have two indexes:

  • A multicolumn index on (rank, id).

  • Another one on just (id) - you probably have a pk constraint on the column doing that already. (A multicolumn index with leading id would do the job as well.)

More about indexes:
Is a composite index also good for queries on the first field?

If rank is not volatile it would be more efficient to parameterize it additionally instead of retrieving it dynamically - but the volatility of rank seems to be the point of your deliberations ...

Upvotes: 2

Related Questions