zhanymkanov
zhanymkanov

Reputation: 606

Select average and group by doesn't work with window functions

I'm trying to select the average rating among all reviews, but also computing values such as total number of reviews, number of reviews grouped by rating, and the percentage of each rating group (e.g. "5": 10, "4": 7, "3": 5)

Problem is, the average rating is calculated incorrectly.

SQL

SELECT
       rating,
       count(*) as rating_count,  /* number of ratings per group(e.g. 5: 10, 4: 7, 3: 5) */
       avg(rating) over() as rating_avg, /* average rating among all reviews */
       count(*) * 100 / sum(count(*)) over() as rating_share, /* percentage share of the rating group */
       sum(count(*)) over() as total_reviews
FROM post_review
WHERE post_id = 1
GROUP BY rating;

Sample data

post_id, rating
1, 5,
1, 3,
1, 5,

Expected rating_avg: 4.33

Resulted rating_avg: 4

Question: I know, that avg(rating) over() as rating_avg, is not a correct statement for selecting the average rating in this case and I can't come up with a proper solution.

Is that possible to do all those computations in one SQL statement?

EDIT: Detailed Result

rating | rating_count | rating_avg | rating_share | total_reviews
__________________________________________________________________
3      |1             | 4          | 33.33        | 3
5      |2             | 4          | 66.67        | 3

Upvotes: 0

Views: 129

Answers (1)

Jonathan Willcock
Jonathan Willcock

Reputation: 5245

You can achieve this with a cte, instead of window functions. Something like this:

with cte as
(SELECT avg(rating) rating_avg,
count(*) total_reviews, post_id 
FROM post_review 
WHERE post_id = 1
GROUP BY post_id
)
SELECT rating,
count(*) as rating_count,
cte.rating_avg,
(count(*) * 100)::numeric / cte.total_reviews::numeric as rating_share,
cte.total_reviews
FROM post_review 
inner join cte ON post_review.post_id = cte.post_id
WHERE post_review.post_id = 1
GROUP BY rating, cte.rating_avg, cte.total_reviews;

The problem with your SQL is the GROUP BY. The average rating is 4 because the GROUP BY condenses the two ratings of 5 into 1, so that the average is the average of 5 and 3 (=4) rather than 5, 5, and 3.

Edit

As an alternative using sum over() you can do:

SELECT 
       rating,
       count(*) as rating_count, 
       sum(rating * count(*)) over() / sum(count(*)) over() as rating_avg, 
       count(*) * 100 / sum(count(*)) over() as rating_share, 
       sum(count(*)) over() as total_reviews
FROM post_review
WHERE post_id = 1
GROUP BY rating;

Upvotes: 1

Related Questions