Reputation: 87
I am having a little trouble getting the percentage.
colA | cloB | colC
4 | a | y
5 | b | y
7 | a | n
8 | a | y
------------------
Output:
a 67%
b 100%
I have to get the percentage of colC(all the 'y') for each letter in colB.
I have been able to get both totals seperatly but can't seem to get the percentage to work.
Gets the total 'y' for a,b(colB);
SELECT colB, COUNT(*) FROM tbl WHERE colC = '"y"' GROUP BY colB;
Output:
a 2
b 1
Gets totals overall total for colB
SELECT COUNT(colC) FROM tbl WHERE colC = '"y"';
Output:
4
Thanks in advance
Upvotes: 3
Views: 17577
Reputation: 1150
I'm not (yet :D) an expert in Hive but you can try a subquery like that :
SELECT
t1.colB,
100*(count(colB)/sub.cnt)
FROM
tbl t1,
(SELECT COUNT(colC) as cnt
FROM tbl
WHERE colC = 'y') sub
JOIN
tbl ON (sub.colC = tbl.colC))
GROUP BY
colB;
Ps : I'm not 100% sure and I can't test it actually, just trying to help.
Upvotes: 0
Reputation: 1630
Use the IF UDF to avoid the join
SELECT t1.colB,
SUM( IF( colC == 'y', 1 , 0 ) )/ COUNT(*) * 100 as pct
FROM tbl t1
GROUP BY t1.colB;
Upvotes: 9