Reputation: 116
It's been asked and answered for SQL (Convert multiple rows into one with comma as separator), would any of the approaches mentioned work in Hive, e.g. to go from this:
+------+------+
| Col1 | Col2 |
+------+------+
| a | 1 |
| a | 5 |
| a | 6 |
| b | 2 |
| b | 6 |
+------+------+
to this:
+------+-------+
| Col1 | Col2 |
+------+-------+
| a | 1,5,6 |
| b | 2,6 |
+------+-------+
Upvotes: 12
Views: 16117
Reputation: 69
Try this
SELECT Col1, concat_ws(',', collect_set(Col2)) as col2
FROM your_table
GROUP BY Col1;
Upvotes: 5
Reputation: 81
And there is collect_list
that will take full list (with duplicates).
Upvotes: 8
Reputation: 2543
The aggregator function collect_set
can achieve what you are trying to get. Here is the documentation. So you can write a query like:
SELECT Col1, collect_set(Col2)
FROM your_table
GROUP BY Col1;
However, there is one striking difference between MySQL's GROUP BY
and Hive's collect_set
that while GROUP_CONCAT
also retains duplicates in the resulting array, collect_set
removes the duplicates occuring in the array. In the example shown by you there are no repeating group values for Col2
so you can go ahead and use it.
Upvotes: 22