Craig
Craig

Reputation: 1985

Hive - max value across columns/multiple rows

My data looks like this:

ip, category, a_score, b_score, c_score
1.2.3.4, X, 0, 5, 0
1.2.3.4, X, 3, 0, NULL
1.2.3.4, Y, 1, NULL, NULL
1.2.3.4, Y, 0, 2, 0
1.2.3.4, Z, 1, 0, 9
1.2.3.4, Z, 0, 2, 0

Each IP may have up to 8 categories and up to 3 valid scores (a_score, b_score, c_score) per category.

For each IP, I need to find the maximum score per category across all scores for that category.

So in this case, I would get

1.2.3.4, X, 5
1.2.3.4, Y, 2
1.2.3.4, Z, 9

i.e., the largest value across a/b/c scores for category X was the c_score in the first row; the largest value across a/b/c scores for category Y was the b_score in the second row; the largest value across a/b/c scores for category Z was the c_score in the first row.

I tried something on the lines of

select ip, category, greatest(max(a_score), max(b_score), max(c_score)) from mytable where ip = '1.2.3.4' group by ip, category;

But this returns only NULLs.

How would I go about this? TIA.

Upvotes: 0

Views: 616

Answers (2)

Barbaros Özhan
Barbaros Özhan

Reputation: 65408

Use coalesce with 0(zero) :

select ip, category, 
     max(greatest(coalesce(a_score,0), coalesce(b_score,0), coalesce(c_score,0))) max_score
  from mytable 
 where ip = '1.2.3.4' 
group by ip, category;

Rextester Demo

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1271003

greatest() has a problem with NULL values, as you see. For your case, you can do something like:

select ip, category,
       nullif(greatest(coalesce(max(a_score), -1), coalesce(max(b_score), -1), coalesce(max(c_score), -1)), -1)
from mytable
where ip = '1.2.3.4'
group by ip, category;

This assumes that the scores are never negative.

An alternative is to unpivot before aggregation. However, that either requires reading the data multiple times or complex case expressions.

Upvotes: 2

Related Questions