Reputation: 3722
I'm trying to query a table and de-dupe the query results
select distinct(id), distinct(name), distinct(date)
from table
where data_date = '20180722'
and active = True
group by market, country
order by userrole
What I want this to do is give me a list of distinct id, name, date etc. which is the first row from each of the grouped market, country groups, which are first sorted by userrole. Is there a way to do this in hive
?
in R
this would be:
df %>%
select(id, name, date) %>%
group_by(market, country) %>%
arrange(userrole) %>%
slice(1)
Upvotes: 0
Views: 2624
Reputation: 49260
distinct
in SQL and HiveQL isn't a function, but a keyword. Specifying distinct
once per query will give you distinct results, like select distinct col1,col2...
.
You can use row_number
function to achieve this.
select id,name,date
from (select t.*,row_number() over(partition by market,country order by userrole) as rnum
from tbl t
) t
where rnum=1
Upvotes: 1