Reputation: 2405
I have the following query in MSSQL
SELECT TOP 50 CustomerID FROM Ratings
WHERE CustomerID != 915
AND MovieID IN (SELECT DISTINCT MovieID FROM Ratings WHERE CustomerID = 915)
GROUP BY CustomerID
ORDER BY count(*) DESC
It is super fast. When I try to use it in a subquery like this.
SELECT * FROM Ratings
WHERE MovieID = 1 AND
CustomerID IN (SELECT TOP 50 CustomerID FROM Ratings
WHERE CustomerID != 915
AND MovieID IN (SELECT DISTINCT MovieID FROM Ratings WHERE CustomerID = 915)
GROUP BY CustomerID
ORDER BY count(*) DESC)
Any ideas on why this is so slow and how I can speed it up? My primary key is (MovieID-CustomerID) and I added a index on CustomerID
Upvotes: 2
Views: 6593
Reputation: 425341
You'll need to create an extra UNIQUE
index on (CustomerID, MovieID)
(in this order) to improve this query.
See the article in my blog for performance details:
Since your subqueries return UNIQUE
sets of values, the query may be rewritten as a JOIN
:
SELECT r2.*
FROM (
SELECT TOP 50 CustomerID
FROM (
SELECT MovieID
FROM Ratings
WHERE CustomerID = 915
) q
JOIN Ratings r
ON r.MovieID = q.MovieID
AND CustomerID <> 915
GROUP BY
CustomerID
ORDER BY
COUNT(*) DESC
) ro
JOIN Ratings r2
ON r2.MovieID = 1
AND r2.CustomerID = ro.CustomerID
To select TOP 50
customers of those who have rated Movie 1
, use:
SELECT r2.*
FROM (
SELECT TOP 50 CustomerID
FROM (
SELECT MovieID
FROM Ratings
WHERE CustomerID = 915
) q
JOIN Ratings r
ON r.MovieID = q.MovieID
AND CustomerID <> 915
AND EXISTS
(
SELECT 1
FROM Ratings re
WHERE re.MovieID = 1
AND re.CustomerID = r.CustomerID
)
GROUP BY
CustomerID
ORDER BY
COUNT(*) DESC
) ro
JOIN Ratings r2
ON r2.MovieID = 1
AND r2.CustomerID = ro.CustomerID
Upvotes: 5