Reputation: 451
I have table like this:
A | null | 15
A | X | 30
B | null | 10
B | Y | 20
I would like to create a query which divides the numbers for A, same for B and so on so the output should be:
row1 | 2
row2 | 2
What is the most efficient way to approach this in Hive considering I will need to make about 20 calculations like this in single query?
EDIT: I always has two rows, one is a count for rows with parameter null, the other is count for non-null parameter. I need to get the rate.
Upvotes: 1
Views: 3399
Reputation: 35323
I would use a self join to divide the data into two data sets: one for numerator based on col2 and one for denominator based on col2. and a case statement to handle if the denominator is 0. Division by zero will return 0 in my case. which may or may not be what you want.
We may have to cast the values to decimal I'm not sure if hive has integer math like SQL server does.
SELECT A.Col1
, case when coalesce(A.Col3,0) = 0 then 0 else B.Col3 / A.Col3 end as Rate
FROM Table A
LEFT JOIN table B
on A.Col1=B.Col1
and A.Col2 is null
and B.Col2 is not null
Upvotes: 1
Reputation: 44941
select col1
, max(case when col2 is not null then col3 end)
/ nullif(max(case when col2 is null then col3 end),0) as results
from mytable
group by col1
;
For @vkp
hive> select 1/0;
OK
NULL
Upvotes: 2