Nathan
Nathan

Reputation: 437

Minor change to SQL SERVER query causes extremely slow execution time

I dont understand whats functionally different about these 2 queries that would make them so different. First my initial query:

SELECT * FROM XSales_Code SC
    WHERE SC.Status = 1
        AND SC.SCode NOT IN
            (
            SELECT DISTINCT SCode FROM XTransactions_01
            WHERE Last_Mdt > '2012-01-01'
                AND SCode IS NOT NULL
            )
        AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc

This took 13 minutes and 6 seconds to execute. Since I'm used to simple queries like this taking several seconds rather then several minutes I played around with and made this query which is, at least in my eyes, equivalent:

SELECT DISTINCT SCode INTO #TEMP1 FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
    AND SCode IS NOT NULL

SELECT * FROM XSales_Code SC
    WHERE SC.Status = 1
        AND SC.SCode NOT IN
            (
            SELECT Scode FROM #TEMP1
            )
        AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc

DROP TABLE #TEMP1

The difference is this query takes 2 seconds to execute vs the 13 minutes above. Whats going on here?

Upvotes: 1

Views: 87

Answers (1)

Ed Gibbs
Ed Gibbs

Reputation: 26343

In both cases you're using a "correlated subquery", which executes for every row in XSales_Code that passes the Status = 1 AND Last_Mdt < '2014-01-01' conditions.

Think of it like this: XSales_Code is filtered by Status = 1 AND Last_Mdt < '2014-01-01', then SQL Server scans each row of this intermediate result, and for every single row it executes your SELECT DISTINCT SCode FROM XTransactions_01... query to see if the row should be included.

Your second query executes the correlated subquery the same number of times, but it's faster because it's executing against a smaller table.

Generally, the fastest way to do a NOT IN query is to left join to the "not in" subquery and then omit any rows where the left-joined column is null. This gets rid of the correlated subquery.

SELECT * FROM XSales_Code SC
LEFT JOIN (
    SELECT DISTINCT SCode FROM XTransactions_01
    WHERE Last_Mdt > '2012-01-01'
        AND SCode IS NOT NULL
) whatevs ON SC.SCode = whatevs.SCode
WHERE SC.Status = 1
  AND SC.Last_Mdt < '2014-01-01'
  AND whatevs.SCode IS NULL
ORDER BY Last_Mdt desc

This is hard to explain, but try running the query above without the second-to-last line (AND whatevs.SCode IS NULL) and you'll see how whatevs.SCODE has a value when the condition is "IN" and is null when the condition is "NOT IN".

Finally, I want to stress that correlated subqueries aren't inherently evil. Generally they work just fine for an IN condition and plenty of other use cases, but for a NOT IN condition they tend to be slow.

Upvotes: 1

Related Questions