Reputation: 25
It has blank data.
ID Page Timestamp Sequence Orestes Login 152356 1 Orestes Account view 152368 Orestes Transfer 152380 Orestes Account view 162382 2 Orestes Loan 162393 Antigone Login 152382 1 Antigone Transfer 152390
I wanna change it like below.
ID Page Timestamp Sequence Orestes Login 152356 1 Orestes Account view 152368 1 Orestes Transfer 152380 1 Orestes Account view 162382 2 Orestes Loan 162393 2 Antigone Login 152382 1 Antigone Transfer 152390 1
I have tried...
with r1
as
(select id, page, timestamp, lag(sequence) over (partition id order by timestamp) as sequence from log)
r2
as
(select id, page, timestamp, sequence from log)
insert into test1
select a.id, a.page, a.timestamp, case when a.sequence is not null then a.sequence
when b.sequence is not null then b.sequence
else a.sequence
end
from r1 a join r2 b on a.id=b.id and a.timestamp=b.timestamp
;
create table test2 like test1
;
with r1
as
(select id, page, timestamp, lag(sequence) over (partition id order by timestamp) as sequence from test1)
r2
as
(select id, page, timestamp, sequence from test1)
insert into test2
select a.id, a.page, a.timestamp, case when a.sequence is not null then a.sequence
when b.sequence is not null then b.sequence
else a.sequence
end
from r1 a join r2 b on a.id=b.id and a.timestamp=b.timestamp
;
create table test3 like test2
;
and it repeat to fill another blank until my fingers are numb...
How do I fill in the blanks to the immediate preceding figures as shown above? I think I should use the recursive query, but can not find a way.
Upvotes: 1
Views: 3591
Reputation: 543
You don't need a recursive query at all.
There is two function in Hive which can help you:
So you query should look like:
create table tmp_table like original_table;
insert into tmp_table
SELECT
id,
page,
ts,
COALESCE(sequence,
LAST_VALUE(sequence, TRUE) OVER(ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))
FROM original_table;
Upvotes: 2