Victor
Victor

Reputation: 3978

Does Postgres window function perform an implicit filtering when using order by in the partitions?

I'm wondering what is happening here in postgres, consider this ready-to-run snippet

select id, 
    value,
    array_agg(id) over (order by value asc) as "array_agg(id) with order",
    array_agg(id) over () as "array_agg(id) without order"
from
(
    values
    (1, 1000),
    (2, 2000)
) as rows (id, value)
order by id asc

Example 1 Results

You may notice that i'm using window function to get aggregrates from the window frame. In both projections "array_agg(id) with order" and "array_agg(id) without order", no filtering is beign made. So i wonder, why the column where i'm ordering the partition is actually giving me the impression of filtering by having just one id in the ordered and two in the not ordered, ¿what verb you put here? for me it would be filtering. The weird thing that gets me more paranoid is that the window frame looks like the same in both partitions when I use a not aggregate function like "lead", please serve yourself:

select id, 
    value,
    lead(id) over (order by value asc) as "lead(id) with order",
    lead(id) over () as "lead(id) without order",
    array_agg(id) over (order by value asc) as "array_agg(id) with order",
    array_agg(id) over () as "array_agg(id) without order"
from
(
    values
    (1, 1000),
    (2, 2000)
) as rows (id, value)
order by id asc

Results example 2

I read the official doc about Window Functions with no clues on this. If any one may explain what is the true behind the milanesa i will more than greatful.

Upvotes: 2

Views: 556

Answers (2)

Laurenz Albe
Laurenz Albe

Reputation: 247595

Like the documentation says:

The frame_clause specifies the set of rows constituting the window frame, which is a subset of the current partition, for those window functions that act on the frame instead of the whole partition.

Now array_agg acts on the frame (it aggregates all rows in the frame), while lead doesn't. The documentation goes on to explain:

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer. Without ORDER BY, this means all rows of the partition are included in the window frame, since all rows become peers of the current row.

So the presence of ORDER BY changes the meaning of the default frame of the window.

Upvotes: 3

jjanes
jjanes

Reputation: 44343

You have overlooked this paragraph from the tutorial:

By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause.

To overcome that, use this construct:

select id, 
    value,
    array_agg(id) over (order by value asc rows between unbounded preceding and unbounded following) as "array_agg(id) with order",
    array_agg(id) over () as "array_agg(id) without order"
from
(
    values
    (1, 1000),
    (2, 2000)
) as rows (id, value)
order by id asc;

Upvotes: 3

Related Questions