madhur
madhur

Reputation: 1001

Pagination with ability to hop on non adjacent page without offset limit

I am using postgres, and trying to paginate the results. The two standard methods failed to cater my needs which are

  1. Offset Limit - because of cost at high offset
  2. Seek - because of inability to jump on non adjacent pages

Then I came across ROW_NUMBER() method. The code is as below

DECLARE @row_per_page INT = 100
DECLARE @page_number INT = 2

SELECT * FROM
(SELECT ROW_NUMBER() OVER (ORDER BY [ID]) AS [RowNumber],*
FROM table_name) AS T
WHERE T.[RowNumber] > (@page_number-1)*@row_per_page AND T.[RowNumber] < @page_number*@row_per_page+1

I am not understanding how this works, If I have a where condition for getting my results it will still search through whole database and then assign them unique row number, then selects some range of row_numbers. Then how is it better than offset-limit?. Can someone explain its working and performance?

Upvotes: 1

Views: 670

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246188

The code you quote above is actually worse than using LIMIT and OFFSET, because it does essentially the same thing by hand, and the code is complicated enough that PostgreSQL won't figure out that it could use a partial index scan or a top-N heapsort.

I'll show you an example:

\d large
     Table "laurenz.large"
┌────────┬─────────┬───────────┐
│ Column │  Type   │ Modifiers │
├────────┼─────────┼───────────┤
│ id     │ integer │ not null  │
│ val    │ text    │           │
└────────┴─────────┴───────────┘
Indexes:
    "large_pkey" PRIMARY KEY, btree (id)

This is what paging with OFFSET and LIMIT does, depending on whether you have an index or not:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large
ORDER BY val
OFFSET 50 LIMIT 10;
                                        QUERY PLAN
------------------------------------------------------------------------------------
 Limit  (cost=52868.58..52868.60 rows=10 width=37)
        (actual time=1657.981..1658.001 rows=10 loops=1)
   Buffers: shared hit=8334
   ->  Sort  (cost=52868.45..55368.45 rows=1000000 width=37)
             (actual time=1657.909..1657.950 rows=60 loops=1)
         Sort Key: val
         Sort Method: top-N heapsort  Memory: 29kB
         Buffers: shared hit=8334
         ->  Seq Scan on large  (cost=0.00..18334.00 rows=1000000 width=37)
                                (actual time=0.010..721.285 rows=1000000 loops=1)
               Buffers: shared hit=8334
 Planning time: 0.078 ms
 Execution time: 1658.036 ms

Without index, PostgreSQL has to scan the whole table, but it can at least figure out that it only needs the first 60 rows.

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large
ORDER BY id
OFFSET 50 LIMIT 10;
                                        QUERY PLAN
-----------------------------------------------------------------------------------------
 Limit  (cost=2.14..2.48 rows=10 width=37)
        (actual time=0.100..0.121 rows=10 loops=1)
   Buffers: shared hit=4
   ->  Index Scan using large_pkey on large  (cost=0.42..34317.43 rows=1000000 width=37)
                                             (actual time=0.022..0.073 rows=60 loops=1
         Buffers: shared hit=4
 Planning time: 0.130 ms
 Execution time: 0.158 ms

With an index, things are reasonably fast because the offset of 50 is small enough that it will not hurt so much.

Now let's try the same thing using row_number:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, val
   FROM (SELECT id, val,
                row_number() OVER (ORDER BY val)
         FROM large
        ) q
   WHERE row_number BETWEEN 51 AND 60;
                                        QUERY PLAN
----------------------------------------------------------------------------------------
 Subquery Scan on q  (cost=172682.84..205182.84 rows=5000 width=37)
                     (actual time=5663.090..10611.557 rows=10 loops=1)
   Filter: ((q.row_number >= 51) AND (q.row_number <= 60))
   Rows Removed by Filter: 999990
   Buffers: shared hit=8334, temp read=9770 written=9770
   ->  WindowAgg  (cost=172682.84..190182.84 rows=1000000 width=45)
                  (actual time=5662.803..9795.077 rows=1000000 loops=1)
         Buffers: shared hit=8334, temp read=9770 written=9770
         ->  Sort  (cost=172682.84..175182.84 rows=1000000 width=37)
                   (actual time=5662.784..8099.025 rows=1000000 loops=1)
               Sort Key: large.val
               Sort Method: external merge  Disk: 48848kB
               Buffers: shared hit=8334, temp read=9770 written=9770
               ->  Seq Scan on large  (cost=0.00..18334.00 rows=1000000 width=37)
                                      (actual time=0.015..827.945 rows=1000000 loops=1
                     Buffers: shared hit=8334
 Planning time: 0.175 ms
 Execution time: 10621.032 ms

(14 rows)

PostgreSQL sorts the whole table, then scans the result and throws away everything but the rows needed.

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, val
   FROM (SELECT id, val,
                row_number() OVER (ORDER BY id)
         FROM large
        ) q
   WHERE row_number BETWEEN 51 AND 60;
                                          QUERY PLAN
---------------------------------------------------------------------------------------------
 Subquery Scan on q  (cost=0.42..64317.43 rows=5000 width=37)
                     (actual time=0.319..3411.027 rows=10 loops=1)
   Filter: ((q.row_number >= 51) AND (q.row_number <= 60))
   Rows Removed by Filter: 999990
   Buffers: shared hit=11069
   ->  WindowAgg  (cost=0.42..49317.43 rows=1000000 width=45)
                  (actual time=0.040..2585.197 rows=1000000 loops=1)
         Buffers: shared hit=11069
         ->  Index Scan using large_pkey on large (cost=0.42..34317.43 rows=1000000 width=37)
                                                  (actual time=0.024..895.798 rows=10
               Buffers: shared hit=11069
 Planning time: 0.261 ms
 Execution time: 3411.119 ms

PostgreSQL doesn't have to sort, but it still scans the whole subquery result and throws away most of the rows.

Upvotes: 2

Related Questions