Rahul Sharma
Rahul Sharma

Reputation: 5824

collect_set on array type with group by in hive

I have below table which contains duplicates on id with array of values for each id and I want to find out unique values for each id, how to do that?

CREATE TABLE test(
id string,
values array<string>)

When I run below command, it throws error as collect_set only support primitive type values.

select id, collect_set(values) from ts group by id;

ERROR:

FAILED: UDFArgumentTypeException Only primitive type arguments are accepted but array was passed as parameter 1.

Upvotes: 3

Views: 12024

Answers (2)

Ambrish
Ambrish

Reputation: 3677

As the error message says Only primitive type arguments are accepted but array was passed as parameter 1., you need to convert the array to String before using it.

You can achieve the same using explode() function. Something like:

select 
  id, 
  collect_set(tokens) 
FROM
  ts LATERAL VIEW explode(values) x AS tokens
group by
  id;

Upvotes: 5

Reihan_amn
Reihan_amn

Reputation: 2727

Another informal way of solving this issue, especially in nested queries with many group by and collect_set is to concatenate the array into one string using concat_ws("", values).

select id, collect_set(concat_ws("", values)) from ts group by id;

Upvotes: 1

Related Questions