sharp
sharp

Reputation: 2158

Hive - sql max number with multiple rows

For the raw data below, how do I get max number per customer_id for the entire row and null for rest of the row? I can get the max for the data but not able to get in the form #Results

#Raw data                           
customer_id  name       location    itemno_1    itemno_2    itemno_3    itemno_4    itemno_5
123          Ashley M   CA          10          null        10       null   null
123          Ashley M   CA          null        12          null        12  null
143          Donald P   FL          15          15          0   1   10
187          Alicia P   GA          15          9           null    null    null
1736         Mike H     CT          null        8           8   9        null
1736         Mike H     CT          null        null       null null         null
1876         David M    CA          null        null       null null         null
532          Matthew T  CA          null        9          10   10  null

Results

customer_id  name       location    itemno_1    itemno_2    itemno_3    itemno_4    itemno_5
123          Ashley M   CA  null    12  null    null    null
143          Donald P   FL  15  null    null    null    null
187          Alicia P   GA  15  null    null    null    null
1736         Mike H     CT  null    null    null    null    null
1876         David M    CA  null    null    null    null    null
532          Matthew T  CA  null    null    null    10  null

Upvotes: 0

Views: 830

Answers (1)

Ram Manohar
Ram Manohar

Reputation: 1004

Below is the query which produces your expected result.(I have tested it works) I have assumed if 2 item_nos have same max value we will keep value in lowest item_no. For example for customer_id = 123 itemno_2 and itemno_4 has value 12 but kept itemno_2 as 12 and made itemno_4 as null.

select customer_id, name, location1
      ,CASE WHEN (i1 >= i2 or i2 is null)
            AND  (i1 >= i3 or i3 is null)
            AND  (i1 >= i4 or i4 is null)
            AND  (i1 >= i5 or i5 is null)
            THEN i1
            ELSE null
       END as itemno_1
      ,CASE WHEN (i2 >= i1 or i1 is null)
            AND  (i2 >= i3 or i3 is null)
            AND  (i2 >= i4 or i4 is null)
            AND  (i2 >= i5 or i5 is null)
            AND  (i1 <> i2 or i1 is null)
            THEN i2
            ELSE null
       END as itemno_2
      ,CASE WHEN (i3 >= i1 or i1 is null)
            AND  (i3 >= i2 or i2 is null)
            AND  (i3 >= i4 or i4 is null)
            AND  (i3 >= i5 or i5 is null)
            AND  (i1 <> i3 or i1 is null)
            AND  (i2 <> i3 or i2 is null)
            THEN i3
            ELSE null
       END as itemno_3
      ,CASE WHEN (i4 >= i1 or i1 is null)
            AND  (i4 >= i2 or i2 is null)
            AND  (i4 >= i3 or i3 is null)
            AND  (i4 >= i5 or i5 is null)
            AND  (i1 <> i4 or i1 is null)
            AND  (i2 <> i4 or i2 is null)
            and  (i3 <> i4 or i3 is null)
            THEN i4
            ELSE null
       END as itemno_4
      ,CASE WHEN (i5 >= i1 or  i1   is null)
            AND  (i5 >= i2 or  i2   is null)
            AND  (i5 >= i3 or  i3   is null)
            AND  (i5 >= i4 or  i4   is null)
            AND  (i1 is null or i1 <> i5)
            AND  (i2 is null or i2 <> i5)
            AND  (i3 is null or i3 <> i5)
            AND  (i4 is null or i4 <> i5)
            THEN i5
            ELSE null
       END as itemno_5

from (
select customer_id, name, location1
      ,max(itemno_1) as i1
      ,max(itemno_2) as i2
      ,max(itemno_3) as i3
      ,max(itemno_4) as i4
      ,max(itemno_5) as i5
from default.stack2
group by customer_id, name, location1) a
order by customer_id;

Same thing can also be achieved by writing UDF instead of case statements to find maximum of 5 columns and return as expected.

Upvotes: 1

Related Questions