enaJ
enaJ

Reputation: 1655

Find the movies with the highest average rating using SQL max()

This is a question from Stanford online database course exercise. Find the movie(s) with the highest average rating. Return these movie title(s) and their average rating. Use SQLite.

I've seen solutions kindly suggested by others, e.g,

  1. fetch the row with max values.
  2. get top entries.

But what I hope to understand here is where and why my current solution present here went wrong.


The movie rating table:

rID mID stars   ratingDate
201 101 2   2011-01-22
201 101 4   2011-01-27
203 103 2   2011-01-20
203 108 4   2011-01-12
203 108 2   2011-01-30
204 101 3   2011-01-09
205 103 3   2011-01-27
205 104 2   2011-01-22
...

Note: mID represents movie ID, rID represents reviewer ID, stars represent the scores ranked by reviewers.

My first thought is to get the average scores for each movie, using code as following:

Select mID, avg(stars) AS avg_stars
From Rating
Group by mID

The resulting summary table is

mID avg_stars
101 3.0
103 2.5
104 2.5
106 4.5
107 4.0
108 3.3

Then I want to select the max values of the scores column and the assciated mIDs

Select mID, max(avg_stars) AS Max_score
From (
Select mID, avg(stars) AS avg_stars
From Rating
Group by mID) T

I expected to get:

mID Max_score
106 4.5

But instead, I get:

mID Max_score
108 4.5

Upvotes: 2

Views: 12624

Answers (4)

dnoeth
dnoeth

Reputation: 60462

You seem to use MySQL as a DBMS, which allows non-Standard syntax:

When you return mID without adding it to GROUP BY MySQL returns a single row with the maximum(average) and a random value for mID.

This is a generic Standard SQL solution:

Select mID, avg(stars) AS avg_stars
From Rating
Group by mID
having avg(stars) =
 ( select max(avg_stars) -- find the max of the averages
   from 
     ( Select mID, avg(stars) AS avg_stars
       From Rating
       Group by mID
     ) T
 )

This might be very inefficient, that's why there are several proprietary syntax extensions. Most DBMSes (but not MySQL) support Standard SQL's Windowed Aggregate Functions:

select *
from
 ( 
   Select mID, avg(stars) AS avg_stars,
      max(avg(stars)) as max_avg
   From Rating
   Group by mID
 ) T
where avg_stars = max_avg

Edit:

As you added SQLite as DBMS, my 2nd query will not work (SQLite doesn't support Analytical Functions, too).

But as WITH is supported you can simplify #1 to a query similar to @user3580870's:

with cte as 
 ( Select mID, avg(stars) AS avg_stars
   From Rating
   Group by mID
 )
select * from cte
where avg_stars =
 ( select max(avg_stars) -- find the max of the averages
   from cte
 );

And this is still Standard SQL compliant...

Upvotes: 4

Dan Getz
Dan Getz

Reputation: 18217

Perhaps a WITH clause can do the trick. Cannot currently access a live DB to test, but the query should look like:

WITH sq AS 
  (SELECT mID, avg(stars) AS avg_stars FROM rating GROUP BY mID)
SELECT mId,avg_stars FROM sq t1 JOIN 
  (SELECT max(avg_stars) AS max_avg FROM sq) t2 
ON t1.avg_stars = t2.max_avg;

SQLite supports WITH clauses from version 3.8.3. This code was actually tested on the data in the question. It also makes the calculation explicitly more efficient than other solutions (though they may be optimized by some smart query analysis). Furthermore, it is clearest and the WITH clause can offer the path for further twists easily.

Upvotes: 2

Rajeev
Rajeev

Reputation: 903

This is just an extension to @SMA's answer.

If you want to consider duplicates records as single entity you can use the following query

SELECT mID,AVG(stars) 
FROM Rating
WHERE AVG(stars) IN (
            (SELECT AVG(stars)
             FROM Rating
             GROUP BY mID
             ORDER BY avg_stars DESC LIMIT 1)
            )
GROUP BY mID

The above query might not be an optimized one.

Upvotes: 1

SMA
SMA

Reputation: 37023

Instead of subquery try using order by and limit to first result:

SELECT mID, AVG(stars) AS avg_stars
FROM Rating
GROUP BY mID
ORDER BY avg_stars DESC LIMIT 1;

Upvotes: 2

Related Questions