Reputation: 755
I have an issue using multiple order by in postgres window functions. Here is short example. Select total number of rows, N first and N last rows from in a single query. (It is not the task that I want to achieve just an example of the issue) Is it expected behavior or a bug in postgres? I'm using postgres 9.6
select generate_series(1, 10) id
into q;
select
count(*) over (),
lag(id , 0) over (order by id asc) a,
lag(id , 0) over (order by id desc) d
from q
limit 5;
Output:
10,10,10
10,9,9
10,8,8
10,7,7
10,6,6
Expected:
10,1,10
10,2,9
10,3,8
10,4,7
10,5,6
Code work good if only N first or only N last rows are selected.
Upvotes: 3
Views: 1035
Reputation: 755
Order from one of the over clause are applied to the data. But data only sorted once. The desired behavior could be achieved by following query.
select count(*) over (),
(array_agg(id) over (order by id asc rows between unbounded preceding and unbounded following))[row_number() over ()] a,
(array_agg(id) over (order by id desc rows between unbounded preceding and unbounded following))[row_number() over ()] d
from q
order by id
limit 5
maybe memory ineffective for large tables because array_agg construct array from all rows.
Upvotes: 0
Reputation: 23726
I once had a similar problem which had the same explanation: https://stackoverflow.com/a/48668220/3984221
Explanation of the behaviour:
You can explain this when you have a look into the EXPLAIN output:
> | QUERY PLAN |
> | :--------------------------------------------------------------------------------------------------------------------------- |
> | WindowAgg (cost=368.69..445.19 rows=2550 width=20) (actual time=0.146..0.150 rows=10 loops=1) |
> | -> WindowAgg (cost=368.69..413.32 rows=2550 width=12) (actual time=0.128..0.136 rows=10 loops=1) |
> | -> Sort (cost=368.69..375.07 rows=2550 width=8) (actual time=0.126..0.128 rows=10 loops=1) |
> | Sort Key: id |
> | Sort Method: quicksort Memory: 25kB |
> | -> WindowAgg (cost=179.78..224.41 rows=2550 width=8) (actual time=0.048..0.056 rows=10 loops=1) |
> | -> Sort (cost=179.78..186.16 rows=2550 width=4) (actual time=0.033..0.034 rows=10 loops=1) |
> | Sort Key: id DESC |
> | Sort Method: quicksort Memory: 25kB |
> | -> Seq Scan on q (cost=0.00..35.50 rows=2550 width=4) (actual time=0.013..0.014 rows=10 loops=1) |
> | Planning Time: 0.292 ms |
> | Execution Time: 0.445 ms |
Here you can see: First there is a SORT Key: id DESC
. So everything is ordered in DESC
order. If you have only the DESC
ordered function, this would be your result, as you already saw. Now, you have a second window function. So, the entire result will be sorted a second time, into the ASC
order, incl. your first result. So, your first lag()
result 10, 9, 8, 7, 6, ...
will be ordered back into 1, 2, 3, 4, 5, ...
Afterwards the second lag()
result will be added.
However, your specific result for your lag()
function is explainable, of course: You don't shift your data, so you get the current value. You can cross check this (as I did in the fiddle above), when you turn your 0
shift value into 1
. Then your DESC
lag()
will return 2
for id 1
, but ASC
gives NULL
. Everything's fine.
So, to create your expected output, you need another approach, e.g. using row_number()
to add the row count in ASC
and DESC
order and filter them afterwards:
SELECT
COUNT(*) OVER (),
a.id,
d.id
FROM (
select
id,
row_number() over (order by id asc)
from q
) a
JOIN (
select
id,
row_number() over (order by id desc)
from q
) d ON a.row_number = d.row_number
LIMIT 5
Upvotes: 3
Reputation: 37482
The second parameter of lag()
determines how many rows to look back. So lag(id, 0)
means to look back zero rows, which makes lag(id, 0)
equivalent of just id
. So the result you get is perfectly sane.
Do get what you want, you can use row_number()
to join on.
SELECT count(*) OVER (),
x1.id,
x2.id
FROM (SELECT id,
row_number() OVER (ORDER BY id ASC) r
FROM q) x1
INNER JOIN (SELECT id,
row_number() OVER (ORDER BY id DESC) r
FROM q) x2
ON x2.r = x1.r
ORDER BY x1.r
LIMIT 5;
Upvotes: 0