Reputation: 115
I have a hive-table containing transactional data of the form (highly simplified)
id Status Value UpdatedTimeStamp
515 Open 1 2014-05-08T11:43:27
516 Open NULL 2014-05-08T11:43:27
515 Answered 1 2014-05-09T11:43:27
515 Closed NULL 2014-05-10T11:43:27
517 Open NULL 2014-05-09T11:43:27
516 Closed NULL 2014-05-09T11:43:27
For my use-case, I need an output table containing unique ids with column entries as the last non-NULL value in that column for that particular id when sorted by timestamp.
Hence, the desired output is:
id Status Value UpdatedTimeStamp
515 Closed 1 2014-05-10T11:43:27
516 Closed NULL 2014-05-09T11:43:27
517 Open NULL 2014-05-09T11:43:27
I am trying to achieve that by using windowing functions on partitions
INSERT OVERWRITE TABLE testSample2 SELECT id, FIRST_VALUE (Status) OVER SortedData, FIRST_VALUE (Value IGNORE NULLS) OVER SortedData, FIRST_VALUE (UpdatedTimeStamp) OVER SortedData FROM testSample WINDOW SortedData (PARTITION BY id ORDER BY UpdatedTimeStamp DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
This gives me NullPointerException (Probably, IGNORE NULLS is not yet supported in Hive). Please suggest me how to take the last non-NULL value.
Upvotes: 2
Views: 2080
Reputation: 11
Try to use something like this:
select ss.vl1, FIRST_VALUE(ss.ky) OVER (PARTITION BY ss.vl1 ORDER BY ss.nl, ss.vl rows unbounded preceding)
FROM (
SELECT IF(s.vl=123456, 'sasa', 'AAAAAA') as vl1, s.vl, s.ky, IF(s.ky IS NULL, 1, 0) as nl
FROM (
select explode(map(NULL, 123, 1, 1234, 2, 12345, 3, 123456)) as (ky, vl) from default.dual) s ) ss;
vl1 _wcol0
AAAAAA 1
AAAAAA 1
AAAAAA 1
sasa 3
Upvotes: 1