BillWang
BillWang

Reputation: 11

Using the window function "last_value", when the values of the sorted field are same, the value snowflake returns is not the last value

As we all known, the window function "last_value" returns the last value within an ordered group of values. In the following example, group by field "A" and sort by field "B" in positive order. In the group of "A = 1", the last value is returned, which is, the C value 4 when B = 2. However, in the group of "A = 2", the values of field "B" are the same. At this time, instead of the last value, which is, the C value 4 in line 6, the first C value 1 in B = 2 is returned. This puzzles me why the last value within an ordered group of values is not returned when I encounter the value I want to use for sorting.

Example

row_number A B C LAST_VALUE(C) IGNORE NULLS OVER (PARTITION BY A ORDER BY B ASC)
1 1 1 2 4
2 1 1 1 4
3 1 1 3 4
4 1 2 4 4
5 2 2 1 1
6 2 2 4 1

Upvotes: 1

Views: 3470

Answers (3)

Asim Rout
Asim Rout

Reputation: 103

Good day Bill!

Right, the sorting is not stable and it will return different output each time.

To get stable results, we can run something like below

select
column1,
column2,
column3,
last_value(column3) over (partition by column1 order by 
column2,column3) as column2_last
from values
(1,1,2), (1,1,1), (1,1,3),
(1,2,4), (2,2,1), (2,2,4)
 order by column1;

Upvotes: 0

Lukasz Szozda
Lukasz Szozda

Reputation: 175924

This puzzles me why the last value within an ordered group of values is not returned when I encounter the value I want to use for sorting.

For partition A equals 2 and column B, there is a tie:

enter image description here

The sort is NOT stable. To achieve stable sort a column or a combination of columns in ORDER BY clause must be unique.


To ilustrate it:

SELECT C
FROM tab
WHERE A = 2
ORDER BY B
LIMIT 1;

It could return either 1 or 4.

Upvotes: 2

NickW
NickW

Reputation: 9798

If you sort by B within A then any duplicate rows (same A and B values) could appear in any order and therefore last_value could give any of the possible available values.

If you want a specific row, based on some logic, then you would need to sort by all columns within the group to reflect that logic. So in your case you would need to sort by B and C

Upvotes: 0

Related Questions