Yuriy Kirel
Yuriy Kirel

Reputation: 111

How to calculate percentage of entries out of total those match the condition?? SQL

I've been playing around it for the whole day and it's by far the most hard topic to understand in SQL.

Say we have a students table, which consists of group number and students rating as so:

students

Each group can contain multiple students

And now I want to look for groups where at least 60% of students have rating of 4 or higher.

Expecting something like:

group_n percentage_of_goodies
1120 0.7
1200 0.66
1111 1

I tried this

with group_goodies as (
select group_n,  count(id) goodies from students
    where rating >= 4
    group by group_n
), group_counts as (
select group_n, count(id) acount from students
    group by group_n
)
select cast(group_goodies.goodies as float)/group_counts.acount  from group_counts, group_goodies
where cast(group_goodies.goodies as float)/group_counts.acount > 0.6;

and got an unexpected result result

where percentage seems to surpass 100% (and it's not because I misplaced denominator, since there are controversial outputs below as well), which is obviously is not intended. There are also more output rows than there are groups. Apparently, I could use window functions, but I can't figure it out myself.. So how can I have this query done?

Problem is extracting count of students before and after the query seems to be impossible within a single query, so I had to create 2 CTEs in order to receive needed results. Both of the CTEs seems to output proper result, where in first CTE amount of students rarely exceeds 10, and in second CTE amounts are naturally smaller and match needs. But When I divide them like that, it results in something unreasonable.

If someone explains it properly, one will make my day 😳

Upvotes: 0

Views: 511

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270633

If I understand correctly, this is a pretty direct aggregation query:

select group_id, avg( (rating >= 4)::int ) as good_students
from students
group by group_id
having avg( (rating >= 4)::int ) > 0.6;

I don't see why two levels of aggregation would be needed.

The avg() works by converting each rating to 0 if less than or equal to 4 or 1 for the higher ones. The average of these values is the ratio that are 1.

Upvotes: 2

Laurenz Albe
Laurenz Albe

Reputation: 247485

First, aggregate the students per group and rating to set a partial sum.

Then calculate the fraction of ratings that are 4 or better and report only those groups where that fraction exceeds 0.6.

SELECT group_n,
       100.0 *
       sum(ratings_per_group) FILTER (WHERE rating >= 4) /
       sum(ratings_per_group)
          AS fraction_of_goodies
FROM (SELECT group_n, rating,
             count(*) AS ratings_per_group
      FROM students
      GROUP BY group_n, rating
     ) AS per_group_and_rating
GROUP BY group_n
HAVING sum(ratings_per_group) FILTER (WHERE rating >= 4) /
       sum(ratings_per_group) > 0.6;

Upvotes: 2

Related Questions