Reputation: 117
I am trying to achieve something as below in HIVE.
Input:
Col1 Col2 Col3 Col4 Col5
A1 B1 C1 D1 E1
A1 B1 C2 D1 E1
A3 B3 C3 D3 E3
A4 B4 C4 D4 E4
Output:
Col1 Col2 Col3 Col4 Col5
A1 B1 C1,C2 D1 E1
A3 B3 C3 D3 E3
A4 B4 C4 D4 E4
Row 1 and Row 2 are similar except for Col3 in input. In output it should be separated by ,
instead of two rows.
Upvotes: 1
Views: 2976
Reputation: 977
collect_set
does exactly what you want. Below query should do:
select col1, col2, col4, col5, concat_ws(",", collect_set(col3)) from <table> group by col1, col2, col4, col5;
collect_set
removes duplicates gives you a unique list. If you have duplicates in col3 and want to preserve those, you can use collect_list instead. So if your col3 has C1, C1, C2; using collect_set gives "C1, C2". Using collect_list gives "C1, C1, C2"
Edit: The concat_ws converts output in the required format.
Upvotes: 3