kartik
kartik

Reputation: 93

Max aggregate function in hive

My current table is as follows:

ID    EventID    Time               Count
1     ABC        1435205220000      5
1     ABC        1500205220000      3
2     DEF        1435205220000      4

Output:

ID    EventID    Time               Count
1     ABC        1435205220000      5
2     DEF        1435205220000      4

Currently, I Group by ID, EventID to get max(Count). However, I need Time in the output as well, but if I add Time to Group By column then I wont get the desired output.

Upvotes: 0

Views: 3885

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269603

Use row_number():

select t.*
from (select t.*,
             row_number() over (partition by id order by count desc) as seqnum
      from table t
     ) t
where seqnum = 1;

EDIT:

I would suggest that you update the version of Hive. An alternative is something like this:

select t.*
from table t join
     (select id, max(count) as maxc
      from table t
      group by id
     ) tt
     on t.id = tt.id;

This will return duplicate rows if an id has two with the same max count. (You can get the same effect in the first query by using rank() instead of row_number().

Upvotes: 1

Related Questions