Illogical result using Order By along with Partition By

Question

Let's say there is a table dd:

id (integer)	name (varchar)	ts (date)
1	first	2021-03-25
2	first	2021-03-30

When I query this table with following:

SELECT *, MAX(ts) OVER (PARTITION BY name ORDER BY ts) max_ts FROM dd;

Then the result is:

id (integer)	name (varchar)	ts (date)	max_ts (date)
1	first	2021-03-25	2021-03-25
2	first	2021-03-30	2021-03-30

When I add "DESC" to Order By clause:

SELECT *, MAX(ts) OVER (PARTITION BY name ORDER BY ts DESC) max_ts FROM dd;

The result is:

id (integer)	name (varchar)	ts (date)	max_ts (date)
2	first	2021-03-30	2021-03-30
1	first	2021-03-25	2021-03-30

This time the result is what I expect. Considering that I am partitioning records by name and then getting the max date from them, I expect the max_ts values to be the same (the max one) in both cases, since the order should not really matter when getting the max value from the group. But in fact, in the first case the result contains different max_ts values, not the maximum one.

Why does it work this way? Why does ordering affect the result?

Gordon Linoff · Accepted Answer

This syntax:

MAX(ts) OVER (PARTITION BY name ORDER BY ts)

is a cumulative maximum ordered by ts. The window frame starts with the smallest value of ts. Each subsequent row is larger -- because the ORDER BY is the same column as ts. This is not interesting; ts on each row is the cumulative maximum when ordered by ts.

On the other hand:

MAX(ts) OVER (PARTITION BY name ORDER BY ts DESC)

This is the cumulative maximum in reverse order. So, the first row in the window frame is the maximum ts. All subsequent rows will be the maximum.

This is not the most efficient way to express this, though. I think this better captures the logic you want:

MAX(ts) OVER (PARTITION BY name)

Illogical result using Order By along with Partition By

Answers (1)

Related Questions