user3657361
user3657361

Reputation: 87

Calculating percentage in hive

I am having a little trouble getting the percentage.

colA | cloB | colC
4    |  a   |  y
5    |  b   |  y
7    |  a   |  n
8    |  a   |  y
------------------
Output:
a 67%
b 100%

I have to get the percentage of colC(all the 'y') for each letter in colB.

I have been able to get both totals seperatly but can't seem to get the percentage to work.
Gets the total 'y' for a,b(colB);

SELECT colB, COUNT(*) FROM tbl WHERE colC = '"y"' GROUP BY colB;
Output:
a   2
b   1

Gets totals overall total for colB

SELECT COUNT(colC) FROM tbl WHERE colC = '"y"';
Output:
4

Thanks in advance

Upvotes: 3

Views: 17577

Answers (2)

Junayy
Junayy

Reputation: 1150

I'm not (yet :D) an expert in Hive but you can try a subquery like that :

SELECT 
    t1.colB,
    100*(count(colB)/sub.cnt)
FROM 
    tbl t1,
    (SELECT COUNT(colC) as cnt 
     FROM tbl 
     WHERE colC = 'y') sub
JOIN 
    tbl ON (sub.colC = tbl.colC))
GROUP BY 
    colB;

Ps : I'm not 100% sure and I can't test it actually, just trying to help.

Upvotes: 0

Jerome Banks
Jerome Banks

Reputation: 1630

Use the IF UDF to avoid the join

SELECT t1.colB,
   SUM( IF( colC == 'y', 1 , 0 ) )/ COUNT(*) * 100 as pct
FROM tbl t1
GROUP BY t1.colB;

Upvotes: 9

Related Questions