Matt W.
Matt W.

Reputation: 3722

get distinct field in hive query

I'm trying to query a table and de-dupe the query results

select distinct(id), distinct(name), distinct(date)
from table
where data_date = '20180722'
and active = True
group by market, country
order by userrole

What I want this to do is give me a list of distinct id, name, date etc. which is the first row from each of the grouped market, country groups, which are first sorted by userrole. Is there a way to do this in hive?

in R this would be:

df %>%
    select(id, name, date) %>%
    group_by(market, country) %>%
    arrange(userrole) %>%
    slice(1)

Upvotes: 0

Views: 2624

Answers (1)

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

distinct in SQL and HiveQL isn't a function, but a keyword. Specifying distinct once per query will give you distinct results, like select distinct col1,col2....

You can use row_number function to achieve this.

select id,name,date
from (select t.*,row_number() over(partition by market,country order by userrole) as rnum
      from tbl t
     ) t
where rnum=1

Upvotes: 1

Related Questions