Surbhi Jain
Surbhi Jain

Reputation: 13

Group by on single column and select multiple columns in hive

I have data in a Hive table as given below.

Table 1:

ID CODE VALUE  

1  XXXX 100  
2  AAAA 200
1  YYYY 300  
3  DDDD 300  
4  BBBB 200  
2  CCCC 300  
3  HHHH 200

I have to select ID, VALUE from Table 1 and load into Table 2.Since Table 1 has duplicate ID's, i have to get unique rows from Table 1 based on the data in CODE Field.

order of selection of rows from table 1 is first priority to be given to CCCC, then to XXXX, then to AAAA and so on.

I am expecting Table 2 to have data like below-

ID VALUE  
1   100
2   300  
3   300  
4   200

Upvotes: 1

Views: 1091

Answers (1)

leftjoin
leftjoin

Reputation: 38335

You can define priority in the CASE statement. Assign priorities to each CODE or list of codes. Priority calculated in the CASE will be used for ordering inside ID partition before numbering rows by row_number analityc function. Rows with the same ID and same priority will be ordered randomly, so better try to define priorities for all codes. See comments in the code:

with your_data as ( --Use your table instead of this CTE
select stack(7,
1,'XXXX',100, 
2,'AAAA',200,
1,'YYYY',300, 
3,'DDDD',300, 
4,'BBBB',200, 
2,'CCCC',300, 
3,'HHHH',200
) as (ID,CODE,VALUE)
)

select ID, VALUE 
from
(
select ID,CODE,VALUE,
       row_number() over(partition by ID order by case when CODE='CCCC' then 1 --define priority here
                                                       when CODE='XXXX' then 2
                                                       when CODE='AAAA' then 3
                                                       when CODE='DDDD' then 4
                                                       when CODE='HHHH' then 5
                                                       --and so on... add more cases
                                                       else 99  --default priority
                                                    end) rn
  from your_data d
)s
where rn=1 --remove duplicates  
;

Result:

OK
1       100
2       300
3       300
4       200

Upvotes: 0

Related Questions