Reputation: 71
I'm currently working through this https://cs50.harvard.edu/x/2020/psets/7/movies/ and trying to complete 9.sql.
There a database called "movies", with tables: movies (columns: id, title, year), people (id, name, birth) and stars (movie_id, person_id).
The task is:
write a SQL query to list the names of all people who starred in a movie released in 2004, ordered by birth year. It should return 18,013 names.
So far, this is what I've done:
SELECT count(distinct name)
from people
join stars on stars.person_id = people.id
join movies on stars.movie_id = movies.id
WHERE year = 2004;
However, this only returns a count of 17,965...
Can anyone see why this might be?
Upvotes: 2
Views: 8022
Reputation: 3190
If you count(distinct person_id)
, then you will get 18013. It's reasonable that name is not unique. What is unreasonable is the directive in the exam saying you should list only name.
One way to correctly distinct the names is to execute this:
SELECT p.name
from people p
where p.id in (
select distinct s.person_id
from stars s join movies m on s.movie_id = m.id
WHERE m.year = 2004)
And if you do it that way, then you don't even need the distinct
due to the definition of the in
operator. But you probably will get the same execution plan regardless.
It's ok, in my opinion, to list a p.name
more than once if it belongs to another person. The query you wrote would have been ok if the rule started with these words:
If a person's name ...
instead of these words:
If a person ...
Which reminds me of something C. J. Date did in class one day. He put a foil on his projector that projected an image of a smoking pipe on the wall. He then asked: what is this?
Since it was a database class and not a physics class, nobody dared to be a smart-a**.
Upvotes: 7