hive rows preceding unexpected behavior

Question

Given this ridiculously simple data set:

+--------+-----+
| Bucket | Foo |
+--------+-----+
|      1 | A   |
|      1 | B   |
|      1 | C   |
|      1 | D   |
+--------+-----+

I want to see the value of Foo in the previous row:

select
foo,
max(foo) over (partition by bucket order by foo rows between 1 preceding and 1 preceding) as prev_foo
from
...

Which gives me:

+--------+-----+----------+
| Bucket | Foo | Prev_Foo |
+--------+-----+----------+
|      1 | A   | A        |
|      1 | B   | A        |
|      1 | C   | B        |
|      1 | D   | C        |
+--------+-----+----------+

Why do I get 'A' back for the first row? I would expect it to be be null. It's throwing off calculations where I'm looking for that null. I can work around it by throwing a row_number() in there, but I'd prefer to handle it with fewer calcs.

leftjoin · Accepted Answer

use the LAG function to get previous row:

LAG(foo) OVER(partition by bucket order by foo) as Prev_Foo

hive rows preceding unexpected behavior

Answers (1)

Related Questions