Reputation: 3
I wrote a SQL query to answer the following question:
Find all the actors that made more movies with Yash Chopra than any other director in the IMBD database.
Sample schema:
person
(pid *
,name
);
m_cast
(mid *
,pid *
);
m_director
(mid*
,pid*
);
* = (component of) PRIMARY KEY
Following is my query:
WITH common_actors AS
(SELECT A.actor_id as actors, B.director_id as director_id, B.movies as movies_with_director,
B.director_id as yash_chops_id, B.movies as movies_with_yash_chops FROM
(SELECT M_Cast.PID as actor_id, M_Director.PID as director_id, COUNT(*) as movies from M_Cast
left join M_Director
ON M_Cast.MID = M_Director.MID
GROUP BY actor_id, director_id) A
JOIN
(SELECT M_Cast.PID as actor_id, M_Director.PID as director_id, COUNT(*) as movies from M_Cast
left join M_Director
ON M_Cast.MID = M_Director.MID
GROUP BY actor_id, director_id
)B
ON A.actor_id = B.actor_id
WHERE B.director_id in (SELECT PID FROM Person WHERE Name LIKE
'%Yash%Chopra%'))
SELECT distinct actors as actor_id, movies_with_yash_chops as total_movies FROM common_actors
WHERE actors NOT IN (SELECT actors FROM common_actors WHERE movies_with_director > movies_with_yash_chops)
And the result obtained from this is of length: 430 rows. However the result obtained should be of length 243 rows. Could anyone please suggest where I went wrong in my query? Is my approach right?
Sample result:
Actor name
0 Sharib Hashmi
1 Kulbir Badesron
2 Gurdas Maan
3 Parikshat Sahni
...
242 Ramlal Shyamlal
Thanks in advance!
Upvotes: 0
Views: 1098
Reputation: 33945
Consider the following:
DROP TABLE IF EXISTS person;
CREATE TABLE person
(person_id SERIAL PRIMARY KEY
,name VARCHAR(20) NOT NULL UNIQUE
);
DROP TABLE IF EXISTS movie;
CREATE TABLE movie
(movie_id SERIAL PRIMARY KEY
,title VARCHAR(50) NOT NULL UNIQUE
);
DROP TABLE IF EXISTS m_cast;
CREATE TABLE m_cast
(movie_id INT NOT NULL
,person_id INT NOT NULL
,PRIMARY KEY(movie_id,person_id)
);
DROP TABLE IF EXISTS m_director;
CREATE TABLE m_director
(movie_id INT NOT NULL
,person_id INT NOT NULL
,PRIMARY KEY(movie_id,person_id)
);
INSERT INTO person (name) VALUES
('Steven Feelberg'),
('Manly Kubrick'),
('Alfred Spatchcock'),
('Fred Pitt'),
('Raphael DiMaggio'),
('Bill Smith');
INSERT INTO movie VALUES
(1,'Feelberg\'s Movie with Fred & Raph'),
(2,'Feelberg and Fred Ride Again'),
(3,'Kubrick shoots DiMaggio'),
(4,'Kubrick\'s Movie with Bill Smith'),
(5,'Spatchcock Presents Bill Smith');
INSERT INTO m_director VALUES
(1,1),
(2,1),
(3,2),
(4,2),
(5,3);
INSERT INTO m_cast VALUES
(1,4),
(1,5),
(2,4),
(3,5),
(4,6),
(5,6);
I've included the movie table only for ease of reference. It's not relevant to the actual problem. Also, note that this model assumes that cast members are only listed once, regardless of whether or not they have multiple roles in a given film.
The following query asks 'how often have each actor and director worked together'...
An actor is any person who has been a cast member of any movie. A director is any person who has been a director of any movie.
SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director;
+-------------------+-------------------+-------+
| actor | director | total |
+-------------------+-------------------+-------+
| Fred Pitt | Alfred Spatchcock | 0 |
| Fred Pitt | Manly Kubrick | 0 |
| Fred Pitt | Steven Feelberg | 2 |
| Raphael DiMaggio | Alfred Spatchcock | 0 |
| Raphael DiMaggio | Manly Kubrick | 1 |
| Raphael DiMaggio | Steven Feelberg | 1 |
| Bill Smith | Alfred Spatchcock | 1 |
| Bill Smith | Manly Kubrick | 1 |
| Bill Smith | Steven Feelberg | 0 |
+-------------------+-------------------+-------+
By observation, we can see that:
EDIT: While I'm not seriously advocating this as a solution, the following is simply to demonstrate that the kernel provided above is really all you need to solve the problem...
SELECT x.*
FROM
( SELECT a.*
FROM
( SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director
) a
LEFT
JOIN
( SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director
) b
ON b.actor = a.actor
AND b.director <> a.director
AND b.total > a.total
WHERE b.actor IS NULL
) x
LEFT JOIN
( SELECT a.*
FROM
( SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director
) a
LEFT
JOIN
( SELECT a.name actor
, d.name director
, COUNT(DISTINCT ma.movie_id) total
FROM person d
JOIN m_director md
ON md.person_id = d.person_id
JOIN person a
LEFT
JOIN m_cast ma
ON ma.person_id = a.person_id
AND ma.movie_id = md.movie_id
JOIN m_cast x
ON x.person_id = a.person_id
GROUP
BY actor
, director
) b
ON b.actor = a.actor
AND b.director <> a.director
AND b.total > a.total
WHERE b.actor IS NULL
) y
ON y.actor = x.actor AND y.director <> x.director
WHERE y.actor IS NULL;
+-----------+-----------------+-------+
| actor | director | total |
+-----------+-----------------+-------+
| Fred Pitt | Steven Feelberg | 2 |
+-----------+-----------------+-------+
This returns a list of every actor, and the director with whom they've worked most often. In this case, because Bill Smith and Raphael DiMaggio have worked most often equally with two directors, they are excluded from the result.
The answer to your problem is simply to select from this list all rows with Yash Chopra listed as the director.
Upvotes: -1