Reputation: 61
I have one hive table with two column. Type of both column is string. One is simple client id and other is command separated string of item ids. There can be multiple row with same client id but different item id string.
I want one hive query which generate table with two column. one is client id and other is comma separated string with all unique item ids.
Data in original table:
Client Id Item Ids
1 1,2,3,4
2 3,4,6,8
4 4,5,1,3
2 3,4,7,8
3 5,6,8,2
4 7,8,9,4
Query should generate this result
Client Id Item Ids
1 1,2,3,4
2 3,4,7,6,8
4 4,5,1,3,7,8,9
3 5,6,8,2
Upvotes: 1
Views: 8424
Reputation: 38335
Use explode()
and collect_set()
to get unique set, aggregate string using concat_ws
and group by Client_id
:
hive> select Client_id, concat_ws(',',collect_set(item_id)) as item_ids
> from test_t lateral view explode(split(item_ids,',')) a as item_id
> group by Client_id;
Output:
OK
1 1,2,3,4
2 3,4,6,8,7
3 5,6,8,2
4 4,5,1,3,7,8,9
Upvotes: 3