Select distinct on specific columns but select other columns also in hive

Question

I have multiple columns in a table in hive having around 80 columns. I need to apply the distinct clause on some of the columns and get the first values from the other columns also. Below is the representation of what I am trying to achieve.

select distinct(col1,col2,col3),col5,col6,col7
from abc where col1 = 'something';

All the columns mentioned above are text columns. So I cannot apply group by and aggregate functions.

Safwan · Accepted Answer

You can use row_number function to solve the problem.

create table temp as
select *, row_number() over (partition by col1,col2,col3) as rn
from abc 
where col1 = 'something';

select *
from temp
where rn=1

You can also sort the table while partitioning.

row_number() over (partition by col1,col2,col3 order by col4 asc) as rn

Select distinct on specific columns but select other columns also in hive

Answers (2)

Related Questions