user2810706
user2810706

Reputation: 117

Combining two rows with a , in Hive

I am trying to achieve something as below in HIVE.

Input:

Col1  Col2  Col3  Col4  Col5
 A1    B1     C1    D1   E1
 A1    B1     C2    D1   E1
 A3    B3     C3    D3   E3
 A4    B4     C4    D4   E4

Output:

Col1  Col2  Col3  Col4  Col5
 A1    B1    C1,C2  D1   E1
 A3    B3     C3    D3   E3
 A4    B4     C4    D4   E4

Row 1 and Row 2 are similar except for Col3 in input. In output it should be separated by , instead of two rows.

Upvotes: 1

Views: 2976

Answers (1)

collect_set does exactly what you want. Below query should do:

select col1, col2, col4, col5, concat_ws(",", collect_set(col3))  from <table> group by col1, col2, col4, col5;

collect_set removes duplicates gives you a unique list. If you have duplicates in col3 and want to preserve those, you can use collect_list instead. So if your col3 has C1, C1, C2; using collect_set gives "C1, C2". Using collect_list gives "C1, C1, C2"

Edit: The concat_ws converts output in the required format.

Upvotes: 3

Related Questions