Reputation: 1001
I am using postgres, and trying to paginate the results. The two standard methods failed to cater my needs which are
Then I came across ROW_NUMBER() method. The code is as below
DECLARE @row_per_page INT = 100
DECLARE @page_number INT = 2
SELECT * FROM
(SELECT ROW_NUMBER() OVER (ORDER BY [ID]) AS [RowNumber],*
FROM table_name) AS T
WHERE T.[RowNumber] > (@page_number-1)*@row_per_page AND T.[RowNumber] < @page_number*@row_per_page+1
I am not understanding how this works, If I have a where condition for getting my results it will still search through whole database and then assign them unique row number, then selects some range of row_numbers. Then how is it better than offset-limit?. Can someone explain its working and performance?
Upvotes: 1
Views: 670
Reputation: 246188
The code you quote above is actually worse than using LIMIT
and OFFSET
, because it does essentially the same thing by hand, and the code is complicated enough that PostgreSQL won't figure out that it could use a partial index scan or a top-N heapsort.
I'll show you an example:
\d large
Table "laurenz.large"
┌────────┬─────────┬───────────┐
│ Column │ Type │ Modifiers │
├────────┼─────────┼───────────┤
│ id │ integer │ not null │
│ val │ text │ │
└────────┴─────────┴───────────┘
Indexes:
"large_pkey" PRIMARY KEY, btree (id)
This is what paging with OFFSET
and LIMIT
does, depending on whether you have an index or not:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large
ORDER BY val
OFFSET 50 LIMIT 10;
QUERY PLAN
------------------------------------------------------------------------------------
Limit (cost=52868.58..52868.60 rows=10 width=37)
(actual time=1657.981..1658.001 rows=10 loops=1)
Buffers: shared hit=8334
-> Sort (cost=52868.45..55368.45 rows=1000000 width=37)
(actual time=1657.909..1657.950 rows=60 loops=1)
Sort Key: val
Sort Method: top-N heapsort Memory: 29kB
Buffers: shared hit=8334
-> Seq Scan on large (cost=0.00..18334.00 rows=1000000 width=37)
(actual time=0.010..721.285 rows=1000000 loops=1)
Buffers: shared hit=8334
Planning time: 0.078 ms
Execution time: 1658.036 ms
Without index, PostgreSQL has to scan the whole table, but it can at least figure out that it only needs the first 60 rows.
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large
ORDER BY id
OFFSET 50 LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------------
Limit (cost=2.14..2.48 rows=10 width=37)
(actual time=0.100..0.121 rows=10 loops=1)
Buffers: shared hit=4
-> Index Scan using large_pkey on large (cost=0.42..34317.43 rows=1000000 width=37)
(actual time=0.022..0.073 rows=60 loops=1
Buffers: shared hit=4
Planning time: 0.130 ms
Execution time: 0.158 ms
With an index, things are reasonably fast because the offset of 50 is small enough that it will not hurt so much.
Now let's try the same thing using row_number
:
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, val
FROM (SELECT id, val,
row_number() OVER (ORDER BY val)
FROM large
) q
WHERE row_number BETWEEN 51 AND 60;
QUERY PLAN
----------------------------------------------------------------------------------------
Subquery Scan on q (cost=172682.84..205182.84 rows=5000 width=37)
(actual time=5663.090..10611.557 rows=10 loops=1)
Filter: ((q.row_number >= 51) AND (q.row_number <= 60))
Rows Removed by Filter: 999990
Buffers: shared hit=8334, temp read=9770 written=9770
-> WindowAgg (cost=172682.84..190182.84 rows=1000000 width=45)
(actual time=5662.803..9795.077 rows=1000000 loops=1)
Buffers: shared hit=8334, temp read=9770 written=9770
-> Sort (cost=172682.84..175182.84 rows=1000000 width=37)
(actual time=5662.784..8099.025 rows=1000000 loops=1)
Sort Key: large.val
Sort Method: external merge Disk: 48848kB
Buffers: shared hit=8334, temp read=9770 written=9770
-> Seq Scan on large (cost=0.00..18334.00 rows=1000000 width=37)
(actual time=0.015..827.945 rows=1000000 loops=1
Buffers: shared hit=8334
Planning time: 0.175 ms
Execution time: 10621.032 ms
(14 rows)
PostgreSQL sorts the whole table, then scans the result and throws away everything but the rows needed.
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, val
FROM (SELECT id, val,
row_number() OVER (ORDER BY id)
FROM large
) q
WHERE row_number BETWEEN 51 AND 60;
QUERY PLAN
---------------------------------------------------------------------------------------------
Subquery Scan on q (cost=0.42..64317.43 rows=5000 width=37)
(actual time=0.319..3411.027 rows=10 loops=1)
Filter: ((q.row_number >= 51) AND (q.row_number <= 60))
Rows Removed by Filter: 999990
Buffers: shared hit=11069
-> WindowAgg (cost=0.42..49317.43 rows=1000000 width=45)
(actual time=0.040..2585.197 rows=1000000 loops=1)
Buffers: shared hit=11069
-> Index Scan using large_pkey on large (cost=0.42..34317.43 rows=1000000 width=37)
(actual time=0.024..895.798 rows=10
Buffers: shared hit=11069
Planning time: 0.261 ms
Execution time: 3411.119 ms
PostgreSQL doesn't have to sort, but it still scans the whole subquery result and throws away most of the rows.
Upvotes: 2