B_Osipiuk
B_Osipiuk

Reputation: 1038

How to count elements in group, when the element can be assigned to many groups

Given the following data: (id: int, group: string)

+---------+-----------+
| id      | group     | 
+---------+-----------+
| 1       | 'a,b,c'   |
| 2       | 'a'       |
| 3       | 'a,c'     |
+---------+-----------+

I want to get a participant count of each group. The result should be:

+---------+--------+
| group   | count  | 
+---------+--------+
| a       | 3      |
| b       | 1      |
| c       | 2      |
+---------+--------+

How can I achieve it? How the query should look like? I have tried with split() function but without any results :/

Upvotes: 0

Views: 28

Answers (1)

Georgina Skibinski
Georgina Skibinski

Reputation: 13387

This should do the trick:

select
   main.group,
   count(1)
from
   (
   select
       explode(split(group, ",")) as group
   from
       main
   ) main
group by
   main.group

If your query is more complex than the sample you provided, you might also like to read about lateral view.

Ref. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Upvotes: 1

Related Questions