Reputation: 111
I've been playing around it for the whole day and it's by far the most hard topic to understand in SQL.
Say we have a students table, which consists of group number and students rating as so:
Each group can contain multiple students
And now I want to look for groups where at least 60% of students have rating of 4 or higher.
Expecting something like:
group_n | percentage_of_goodies |
---|---|
1120 | 0.7 |
1200 | 0.66 |
1111 | 1 |
I tried this
with group_goodies as (
select group_n, count(id) goodies from students
where rating >= 4
group by group_n
), group_counts as (
select group_n, count(id) acount from students
group by group_n
)
select cast(group_goodies.goodies as float)/group_counts.acount from group_counts, group_goodies
where cast(group_goodies.goodies as float)/group_counts.acount > 0.6;
where percentage seems to surpass 100% (and it's not because I misplaced denominator, since there are controversial outputs below as well), which is obviously is not intended. There are also more output rows than there are groups. Apparently, I could use window functions, but I can't figure it out myself.. So how can I have this query done?
Problem is extracting count of students before and after the query seems to be impossible within a single query, so I had to create 2 CTEs in order to receive needed results. Both of the CTEs seems to output proper result, where in first CTE amount of students rarely exceeds 10, and in second CTE amounts are naturally smaller and match needs. But When I divide them like that, it results in something unreasonable.
If someone explains it properly, one will make my day 😳
Upvotes: 0
Views: 511
Reputation: 1270633
If I understand correctly, this is a pretty direct aggregation query:
select group_id, avg( (rating >= 4)::int ) as good_students
from students
group by group_id
having avg( (rating >= 4)::int ) > 0.6;
I don't see why two levels of aggregation would be needed.
The avg()
works by converting each rating to 0
if less than or equal to 4 or 1
for the higher ones. The average of these values is the ratio that are 1
.
Upvotes: 2
Reputation: 247485
First, aggregate the students per group and rating to set a partial sum.
Then calculate the fraction of ratings that are 4 or better and report only those groups where that fraction exceeds 0.6.
SELECT group_n,
100.0 *
sum(ratings_per_group) FILTER (WHERE rating >= 4) /
sum(ratings_per_group)
AS fraction_of_goodies
FROM (SELECT group_n, rating,
count(*) AS ratings_per_group
FROM students
GROUP BY group_n, rating
) AS per_group_and_rating
GROUP BY group_n
HAVING sum(ratings_per_group) FILTER (WHERE rating >= 4) /
sum(ratings_per_group) > 0.6;
Upvotes: 2