Abed Shareef
Abed Shareef

Reputation: 25

add a new column for unique ID in hive table

i have a table in hive with two columns: session_id and duration_time like this:

|| session_id || duration||

    1               14          
    1               10      
    1               20          
    1               10          
    1               12          
    1               16          
    1               8       
    2               9           
    2               6           
    2               30          
    2               22

i want to add a new column with unique id when:

the session_id is changing or the duration_time > 15

i want the output to be like this:

session_id      duration    unique_id
1               14          1
1               10          1
1               20          2
1               10          2
1               12          2
1               16          3
1               8           3
2               9           4
2               6           4
2               30          5
2               22          6

any ideas how to do that in hive QL?

thanks!

Upvotes: 1

Views: 1701

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270091

SQL tables represent unordered sets. You need a column specifying the ordering of the values, because you seem to care about the ordering. This could be an id column or a created-at column, for instance.

You can do this using a cumulative sum:

select t.*,
       sum(case when duration > 15 or seqnum = 1 then 1 else 0 end) over
           (order by ??) as unique_id
from (select t.*,
             row_number() over (partition by session_id order by ??) as seqnum
      from t
     ) t;

Upvotes: 1

Related Questions