orestes
orestes

Reputation: 25

How to use the recursive query in Hive

It has blank data.

ID          Page            Timestamp   Sequence
Orestes     Login           152356      1
Orestes     Account view    152368  
Orestes     Transfer        152380  
Orestes     Account view    162382      2 
Orestes     Loan            162393  
Antigone    Login           152382      1
Antigone    Transfer        152390  

I wanna change it like below.

ID          Page            Timestamp   Sequence
Orestes     Login           152356      1
Orestes     Account view    152368      1
Orestes     Transfer        152380      1
Orestes     Account view    162382      2 
Orestes     Loan            162393      2
Antigone    Login           152382      1
Antigone    Transfer        152390      1

I have tried...

with r1
as
(select id, page, timestamp, lag(sequence) over (partition id order by timestamp) as sequence from log)
r2
as
(select id, page, timestamp, sequence from log)
insert into test1
select a.id, a.page, a.timestamp, case when a.sequence is not null then a.sequence
                                       when b.sequence is not null then b.sequence 
                                       else a.sequence
                                   end
from r1 a join r2 b on a.id=b.id and a.timestamp=b.timestamp
;
create table test2 like test1
;
with r1
as
(select id, page, timestamp, lag(sequence) over (partition id order by timestamp) as sequence from test1)
r2
as
(select id, page, timestamp, sequence from test1)
insert into test2
select a.id, a.page, a.timestamp, case when a.sequence is not null then a.sequence
                                       when b.sequence is not null then b.sequence 
                                       else a.sequence
                                   end
from r1 a join r2 b on a.id=b.id and a.timestamp=b.timestamp
;
create table test3 like test2
;
and it repeat to fill another blank until my fingers are numb...

How do I fill in the blanks to the immediate preceding figures as shown above? I think I should use the recursive query, but can not find a way.

Upvotes: 1

Views: 3591

Answers (1)

Lyashko Kirill
Lyashko Kirill

Reputation: 543

You don't need a recursive query at all.

There is two function in Hive which can help you:

  • LAST_VALUE - returns the last value of a column
  • COALESCE - returns first not null values

So you query should look like:

create table tmp_table like original_table;

insert into tmp_table
SELECT
    id, 
    page, 
    ts,
    COALESCE(sequence, 
             LAST_VALUE(sequence, TRUE) OVER(ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))
FROM original_table;

Upvotes: 2

Related Questions