Peter John
Peter John

Reputation: 47

Finding Unique entry in Hive query

I need to find the unique fruit which is farmed by only country, it should not cultivated by more than one country. Count of the fruits by the unique country

fruits           |   farming
-------------------------------
apple            | USA
apple            | INDIA
apple            | JAPAN
apple            | CHINA
orange           | USA
orange           | RUSSIA
orange           | PAKISTAN
strawberry       | INDIA
mango            | AUSTRALIA
pineapple        | AUSTRALIA
banana           | INDIA
mosambi          | INDIA
graphes          | SRILANKA
graphes          | NETHERLAND

Output:

INDIA     | 3
AUSTRALIA | 2

strawberry, banana, mosambi is cultivated only in INDIA so total 3 unique fruits mango, pineapple is cultivated only in AUSTRALIA so total 2 unique fruits

Upvotes: 0

Views: 33

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269503

One method uses two levels of aggregation:

select country, count(*) as num_unique_fruits
from (select fruit, min(country) as country
      from t
      group by fruit
      having count(*) = 1
     ) f
group by country;

Note that if there is only one row, then min(country) is the country on that row.

A more "traditional" approach would use not exists:

select country, count(*)
from t
where not exists (select 1 from t t2 where t2.fruit = t.fruit and t2.country <> t.country)
group by country;

The first probably performs better in Hive, but it can be worth trying both methods if performance is a consideration.

Upvotes: 1

Related Questions