user3273269
user3273269

Reputation: 53

Fetch the column which has the Max value for a row in Hive

I have a scenario where i need to pick the greatest value in the row from three columns, there is a function called Greatest but it doesn't work in my version of Hive 0.13.

Please suggest better way to accomplish it.

Example table:

+---------+------+------+------+
| Col1    | Col2 | Col3 | Col4 |
+---------+------+------+------+
| Group A | 1    | 2    | 3    |
+---------+------+------+------+
| Group B | 4    | 5    | 1    |
+---------+------+------+------+
| Group C | 4    | 2    | 1    |
+---------+------+------+------+

expected Result:

+---------+------------+------------+
| Col1    | output_max | max_column |
+---------+------------+------------+
| Group A | 3          | Col4       |
+---------+------------+------------+
| Group B | 5          | col3       |
+---------+------------+------------+
| Group C | 4          | col2       |
+---------+------------+------------+

Upvotes: 2

Views: 1083

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44991

select  col1
       ,tuple.col1                as output_max
       ,concat('Col',tuple.col2)  as max_column

from   (select  Col1
               ,sort_array(array(struct(Col2,2),struct(Col3,3),struct(Col4,4)))[2] as tuple
        from    t
        ) t
;

sort_array(Array)
Sorts the input array in ascending order according to the natural ordering of the array elements and returns it (as of version 0.9.0). https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF


hive> select  col1
    >        ,tuple.col1                as output_max
    >        ,concat('Col',tuple.col2)  as max_column
    >     
    > from   (select  Col1
    >                ,sort_array(array(struct(Col2,2),struct(Col3,3),struct(Col4,4)))[2] as tuple
    >         from    t
    >         ) t
    > ;
OK
Group A 3   Col4
Group B 5   Col3
Group C 4   Col2

Upvotes: 4

Related Questions