Reputation: 25
i have a table in hive with two columns: session_id and duration_time like this:
|| session_id || duration||
1 14
1 10
1 20
1 10
1 12
1 16
1 8
2 9
2 6
2 30
2 22
i want to add a new column with unique id when:
the session_id is changing or the duration_time > 15
i want the output to be like this:
session_id duration unique_id
1 14 1
1 10 1
1 20 2
1 10 2
1 12 2
1 16 3
1 8 3
2 9 4
2 6 4
2 30 5
2 22 6
any ideas how to do that in hive QL?
thanks!
Upvotes: 1
Views: 1701
Reputation: 1270091
SQL tables represent unordered sets. You need a column specifying the ordering of the values, because you seem to care about the ordering. This could be an id column or a created-at column, for instance.
You can do this using a cumulative sum:
select t.*,
sum(case when duration > 15 or seqnum = 1 then 1 else 0 end) over
(order by ??) as unique_id
from (select t.*,
row_number() over (partition by session_id order by ??) as seqnum
from t
) t;
Upvotes: 1