The Lion King
The Lion King

Reputation: 19

Can I select a value from a table when the query restriction is from another table WITHOUT a join?

My model has 3 tables: ACTORS, MOVIES and MOVIE_CAST, which represents the relationship between ACTORS and MOVIES.

Table: ACTORS

Table: Movies

Table: MOVIE_CAST


I need to list the actors and their birth year for all movies directed by ​'Wes Anderson'.

I've tried 2 different queries: in the first I didn't join the tables and in the second I did it. Both gives me the same result but I'm not sure which one is right.

First Query:

SELECT DISTINCT A.NAME, A.BIRTH_YEAR
FROM ACTORS A, MOVIES M
WHERE M.DIRECTOR = 'Wes Anderson'

Second Query:

SELECT DISTINCT A.NAME, A.BIRTH_YEAR
FROM ACTORS A
    JOIN MOVIE_CAST MC ON MC.ACTOR = A.NAME AND MC.BIRTH_YEAR = A.BIRTH_YEAR
    JOIN MOVIES M ON M.TITLE = MC.MOVIE AND M.RELEASE_YEAR = MC.MOVIE_RELEASE_YEAR
WHERE M.DIRECTOR = 'Wes Anderson'

Which query is the correct one?

P.S.: I'm using PostgreSQL/pgAdmin4 database by the way.

Upvotes: 1

Views: 107

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269973

You can use EXISTS:

SELECT A.NAME, A.BIRTH_YEAR
FROM ACTORS A
WHERE EXISTS (SELECT 1
              FROM MOVIE_CAST MC JOIN
                   MOVIES M
                   ON M.TITLE = MC.MOVIE
              WHERE MC.ACTOR = A.NAME AND 
                    M.DIRECTOR = 'Wes Anderson'
             );

You can eliminate the inner join as well:

SELECT A.NAME, A.BIRTH_YEAR
FROM ACTORS A
WHERE EXISTS (SELECT 1
              FROM MOVIE_CAST MC 
              WHERE MC.ACTOR = A.NAME AND
                    EXISTS (SELECT 1
                            FROM MOVIES M
                            WHERE M.TITLE = MC.MOVIE AND
                                  M.DIRECTOR = 'Wes Anderson'
                           )
             );

When learning SQL, this version is often rendered using IN rather than EXISTS:

SELECT A.NAME, A.BIRTH_YEAR
FROM ACTORS A
WHERE A.NAME IN (SELECT MC.ACTOR
                 FROM MOVIE_CAST MC 
                 WHERE MC.MOVIE IN (SELECT M.TITLE
                                    FROM MOVIES M
                                    WHERE  M.DIRECTOR = 'Wes Anderson'
                                   )
                );

However, I recommend EXISTS over IN for two reasons. First, it is often easier to optimize. More importantly, I ALWAYS recommend NOT EXISTS over NOT IN with subqueries due to the handling of NULL values.

I also think that you are much better off understanding how the JOINs work.

Upvotes: 0

Paul Maxwell
Paul Maxwell

Reputation: 35593

The second query is "right".

In the first query you have formed a "Cartesian product" (multiply each row in the actors table by the number of rows in the movies table) and that is why you need to use SELECT DISTINCT to reduce the number of rows returned. This is an inefficient method.

Upvotes: 2

Related Questions