Reputation: 1092
I have a hive table like
col1 col2
1 ["apple", "orange"]
1 ["orange", "banana"]
1 ["mango"]
2 ["apple"]
2 ["apple", "orange"]
There data types is
col1 int
col2 array<string>
I want to query something like :
select col1, concat(col2) from table group by col1;
Output should be :
1 ["apple", "orange", "banana", "mango"]
2 ["apple", "orange"]
Is there any function in hive to do this ?
Also I write this data to csv and when I read it as a dataframe I get the col2 dtype as object
. Is there a way to output it as an array
.
Upvotes: 1
Views: 104
Reputation: 31470
Try by exploding the array then use collect_set
function by grouping by col1
.
Example:
Input:
select * from table;
OK
dd.col1 dd.col2
1 ["apple","orange"]
1 ["mango"]
1 ["orange","banana"]
select col1,collect_set(tt1)col2 from (
select * from table lateral view explode(col2) tt as tt1
)cc
group by col1;
Output:
col1 col2
1 ["apple","orange","mango","banana"]
Upvotes: 1