Reputation: 19
My model has 3 tables: ACTORS, MOVIES and MOVIE_CAST, which represents the relationship between ACTORS and MOVIES.
Table: ACTORS
Table: Movies
Table: MOVIE_CAST
I need to list the actors and their birth year for all movies directed by 'Wes Anderson'.
I've tried 2 different queries: in the first I didn't join the tables and in the second I did it. Both gives me the same result but I'm not sure which one is right.
First Query:
SELECT DISTINCT A.NAME, A.BIRTH_YEAR
FROM ACTORS A, MOVIES M
WHERE M.DIRECTOR = 'Wes Anderson'
Second Query:
SELECT DISTINCT A.NAME, A.BIRTH_YEAR
FROM ACTORS A
JOIN MOVIE_CAST MC ON MC.ACTOR = A.NAME AND MC.BIRTH_YEAR = A.BIRTH_YEAR
JOIN MOVIES M ON M.TITLE = MC.MOVIE AND M.RELEASE_YEAR = MC.MOVIE_RELEASE_YEAR
WHERE M.DIRECTOR = 'Wes Anderson'
Which query is the correct one?
P.S.: I'm using PostgreSQL/pgAdmin4 database by the way.
Upvotes: 1
Views: 107
Reputation: 1269973
You can use EXISTS
:
SELECT A.NAME, A.BIRTH_YEAR
FROM ACTORS A
WHERE EXISTS (SELECT 1
FROM MOVIE_CAST MC JOIN
MOVIES M
ON M.TITLE = MC.MOVIE
WHERE MC.ACTOR = A.NAME AND
M.DIRECTOR = 'Wes Anderson'
);
You can eliminate the inner join as well:
SELECT A.NAME, A.BIRTH_YEAR
FROM ACTORS A
WHERE EXISTS (SELECT 1
FROM MOVIE_CAST MC
WHERE MC.ACTOR = A.NAME AND
EXISTS (SELECT 1
FROM MOVIES M
WHERE M.TITLE = MC.MOVIE AND
M.DIRECTOR = 'Wes Anderson'
)
);
When learning SQL, this version is often rendered using IN
rather than EXISTS
:
SELECT A.NAME, A.BIRTH_YEAR
FROM ACTORS A
WHERE A.NAME IN (SELECT MC.ACTOR
FROM MOVIE_CAST MC
WHERE MC.MOVIE IN (SELECT M.TITLE
FROM MOVIES M
WHERE M.DIRECTOR = 'Wes Anderson'
)
);
However, I recommend EXISTS
over IN
for two reasons. First, it is often easier to optimize. More importantly, I ALWAYS recommend NOT EXISTS
over NOT IN
with subqueries due to the handling of NULL
values.
I also think that you are much better off understanding how the JOIN
s work.
Upvotes: 0
Reputation: 35593
The second query is "right".
In the first query you have formed a "Cartesian product" (multiply each row in the actors table by the number of rows in the movies table) and that is why you need to use SELECT DISTINCT
to reduce the number of rows returned. This is an inefficient method.
Upvotes: 2