Reputation: 15
I have a column called text and another called 'categories' with three values "positive", "negative", "neutral".
How can I calculate the percentage of each of the text values in category? for example if I had 3 rows, 1 row is positive, 1 row is negative and 1 row is neutral what query will produce 33% positive 33% negative and 33% neutral?
This is the stage I got to...
SELECT COUNT(category), category FROM tweets GROUP BY category
Upvotes: 0
Views: 12966
Reputation: 1269503
As a note, I think this is much more simply written using a single subquery:
select t.category, count(*) / t.total, -- number
concat(100 * count(*) / t.total, '%') -- string
from tweets t join
(select count(*) as total) t
group by category;
If you know there are only three categories, I'd put them in one row:
select avg(category = 'positive') as p_positive,
avg(category = 'negative') as p_negative
avg(category = 'neutral') as p_neutral
from tweets t;
This query uses the MySQL feature that a boolean expression is treated as an integer in a numeric context, with "1" for true and "0" for false.
Upvotes: 1
Reputation: 92785
One way to do it
select category, count, count/total percent
from
(
select category, count(category) count
from tweets
group by category
) c JOIN (
select count(*) total
from tweets
) t
Output:
+----------+-------+---------+ | category | count | percent | +----------+-------+---------+ | negative | 1 | 0.3333 | | neutral | 1 | 0.3333 | | positive | 1 | 0.3333 | +----------+-------+---------+
...would it be possible to return just 33% not 0.3333?
select category, count, round(count / total * 100) percent
from
(
select category, count(category) count
from tweets
group by category
) c JOIN (
select count(*) total
from tweets
) t
+----------+-------+---------+ | category | count | percent | +----------+-------+---------+ | negative | 1 | 33 | | neutral | 1 | 33 | | positive | 1 | 33 | +----------+-------+---------+
If you want to add %
you can use do concat(round(count / total * 100), '%')
but I would strongly suggest to do it (any sort of formatting) in the client code.
Upvotes: 3
Reputation: 1
Just a small modification to your current query:
SELECT COUNT(category)/COUNT(*), category FROM tweets GROUP BY category
Upvotes: -2