Can we optimize this query?

Question

I was currently solving the Databases exercises on Coursera and encountered a problem.

Problem

You've started a new movie-rating website, and you've been collecting data on reviewers' ratings of various movies. There's not much data yet, but you can still try out some interesting queries. Here's the schema:

Movie ( mID, title, year, director ) 
Reviewer ( rID, name )
Rating ( rID, mID, stars, ratingDate )

Find the titles of all movies that have no ratings.

Solution 1

SELECT DISTINCT title
FROM Movie, Rating
WHERE Movie.mID NOT IN
(SELECT mID
FROM Rating)

Solution 2

SELECT DISTINCT title
FROM Movie, Rating
WHERE NOT EXISTS(SELECT * FROM Rating
WHERE Rating.mID = Movie.mID)

Solution 3

SELECT title
FROM Movie
WHERE title NOT IN(SELECT title
FROM Movie NATURAL JOIN Rating)

I wanted to know whether this can be optimised because I applied DISTINCT to make them run.

Rapha&#235;l Althaus · Accepted Answer

well, first, you don't need to select from Movie,Rating, as you don't do anything with Rating table.

And this is doing an horrible (and useless) cross join => cartesian product, which force you do make a Distinct.

so

select m.title
from Movie m
where not exists (select null -- or select *, doesn't change anything
                  from Rating r
                  where r.mId = m.mID)

you could also do something like that, but I think the previous one is clearer.

select m.title
from Movie m
left join Rating r on r.mId = m.mID
where r.rID is null

A NOT IN query (without join on Rating) is also perfectly valid, but probably the one with poorest performance, and would be written this way.

select m.title 
from Movie m
where m.mId not in (select distinct mID from Rating)

Can we optimize this query?

Problem

Solution 1

Solution 2

Solution 3

Answers (2)

Related Questions