Reputation: 17
Let's say there is a table dd:
id (integer) | name (varchar) | ts (date) |
---|---|---|
1 | first | 2021-03-25 |
2 | first | 2021-03-30 |
When I query this table with following:
SELECT *, MAX(ts) OVER (PARTITION BY name ORDER BY ts) max_ts FROM dd;
Then the result is:
id (integer) | name (varchar) | ts (date) | max_ts (date) |
---|---|---|---|
1 | first | 2021-03-25 | 2021-03-25 |
2 | first | 2021-03-30 | 2021-03-30 |
When I add "DESC" to Order By clause:
SELECT *, MAX(ts) OVER (PARTITION BY name ORDER BY ts DESC) max_ts FROM dd;
The result is:
id (integer) | name (varchar) | ts (date) | max_ts (date) |
---|---|---|---|
2 | first | 2021-03-30 | 2021-03-30 |
1 | first | 2021-03-25 | 2021-03-30 |
This time the result is what I expect. Considering that I am partitioning records by name and then getting the max date from them, I expect the max_ts values to be the same (the max one) in both cases, since the order should not really matter when getting the max value from the group. But in fact, in the first case the result contains different max_ts values, not the maximum one.
Why does it work this way? Why does ordering affect the result?
Upvotes: 0
Views: 28
Reputation: 1270723
This syntax:
MAX(ts) OVER (PARTITION BY name ORDER BY ts)
is a cumulative maximum ordered by ts
. The window frame starts with the smallest value of ts
. Each subsequent row is larger -- because the ORDER BY
is the same column as ts
.
This is not interesting; ts
on each row is the cumulative maximum when ordered by ts
.
On the other hand:
MAX(ts) OVER (PARTITION BY name ORDER BY ts DESC)
This is the cumulative maximum in reverse order. So, the first row in the window frame is the maximum ts
. All subsequent rows will be the maximum.
This is not the most efficient way to express this, though. I think this better captures the logic you want:
MAX(ts) OVER (PARTITION BY name)
Upvotes: 1