James Van Boxtel
James Van Boxtel

Reputation: 2405

MSSQL JOIN ON GROUP BY is too slow

I have the following query in MSSQL

SELECT TOP 50 CustomerID FROM Ratings
WHERE CustomerID != 915
AND MovieID IN (SELECT DISTINCT MovieID FROM Ratings WHERE CustomerID = 915)
GROUP BY CustomerID
ORDER BY count(*) DESC

It is super fast. When I try to use it in a subquery like this.

SELECT * FROM Ratings
WHERE MovieID = 1 AND
CustomerID IN (SELECT TOP 50 CustomerID FROM Ratings
    WHERE CustomerID != 915
    AND MovieID IN (SELECT DISTINCT MovieID FROM Ratings WHERE CustomerID = 915)
    GROUP BY CustomerID
    ORDER BY count(*) DESC)

Any ideas on why this is so slow and how I can speed it up? My primary key is (MovieID-CustomerID) and I added a index on CustomerID

Upvotes: 2

Views: 6593

Answers (1)

Quassnoi
Quassnoi

Reputation: 425341

You'll need to create an extra UNIQUE index on (CustomerID, MovieID) (in this order) to improve this query.

See the article in my blog for performance details:

Since your subqueries return UNIQUE sets of values, the query may be rewritten as a JOIN:

SELECT  r2.*
FROM    (
        SELECT  TOP 50 CustomerID
        FROM    (
                SELECT  MovieID
                FROM    Ratings
                WHERE   CustomerID = 915
                ) q
        JOIN    Ratings r
        ON      r.MovieID = q.MovieID
                AND CustomerID <> 915
        GROUP BY
                CustomerID
        ORDER BY
                COUNT(*) DESC
        ) ro
JOIN    Ratings r2
ON      r2.MovieID = 1
        AND r2.CustomerID = ro.CustomerID

To select TOP 50 customers of those who have rated Movie 1, use:

SELECT  r2.*
FROM    (
        SELECT  TOP 50 CustomerID
        FROM    (
                SELECT  MovieID
                FROM    Ratings
                WHERE   CustomerID = 915
                ) q
        JOIN    Ratings r
        ON      r.MovieID = q.MovieID
                AND CustomerID <> 915
                AND EXISTS
                (
                SELECT  1
                FROM    Ratings re
                WHERE   re.MovieID = 1
                        AND re.CustomerID = r.CustomerID
                )
        GROUP BY
                CustomerID
        ORDER BY
                COUNT(*) DESC
        ) ro
JOIN    Ratings r2
ON      r2.MovieID = 1
        AND r2.CustomerID = ro.CustomerID

Upvotes: 5

Related Questions