Reputation: 29
I have the following table which gets incremental updates. I need to write a normal Hive query to merge the rows with the same key value with most recent values.
Key | A | B | C | Timestamp
K1 | X | Null | Null | 2015-05-03
K1 | Null | Y | Z | 2015-05-02
K1 | Foo | Bar | Baz | 2015-05-01
Want to get:
Key | A | B | C | Timestamp
K1 | X | Y | Z | 2015-05-03
Upvotes: 1
Views: 970
Reputation: 38335
Use first_value() function to get last not null value. Need to concat sort keys because last_value works only with one sort key.
Demo:
select distinct
key,
first_value(A) over (partition by Key order by concat(case when A is null then '1' else '2' end,'_',Timestamp)desc) A,
first_value(B) over (partition by Key order by concat(case when B is null then '1' else '2' end,'_',Timestamp)desc) B,
first_value(C) over (partition by Key order by concat(case when C is null then '1' else '2' end,'_',Timestamp)desc) C,
max(timestamp) over(partition by key) timestamp
from
( ---------Replace this subquery with your table
select 'K1' key, 'X' a, Null b, Null c, '2015-05-03' timestamp union all
select 'K1' key, null a, 'Y' b, 'Z' c, '2015-05-02' timestamp union all
select 'K1' key, 'Foo' a, 'Bar' b, 'Baz' c, '2015-05-01' timestamp
)s
;
Output:
OK
key a b c timestamp
K1 X Y Z 2015-05-03
Upvotes: 0