Reputation: 437
I dont understand whats functionally different about these 2 queries that would make them so different. First my initial query:
SELECT * FROM XSales_Code SC
WHERE SC.Status = 1
AND SC.SCode NOT IN
(
SELECT DISTINCT SCode FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
)
AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc
This took 13 minutes and 6 seconds to execute. Since I'm used to simple queries like this taking several seconds rather then several minutes I played around with and made this query which is, at least in my eyes, equivalent:
SELECT DISTINCT SCode INTO #TEMP1 FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
SELECT * FROM XSales_Code SC
WHERE SC.Status = 1
AND SC.SCode NOT IN
(
SELECT Scode FROM #TEMP1
)
AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc
DROP TABLE #TEMP1
The difference is this query takes 2 seconds to execute vs the 13 minutes above. Whats going on here?
Upvotes: 1
Views: 87
Reputation: 26343
In both cases you're using a "correlated subquery", which executes for every row in XSales_Code
that passes the Status = 1 AND Last_Mdt < '2014-01-01'
conditions.
Think of it like this: XSales_Code
is filtered by Status = 1 AND Last_Mdt < '2014-01-01'
, then SQL Server scans each row of this intermediate result, and for every single row it executes your SELECT DISTINCT SCode FROM XTransactions_01...
query to see if the row should be included.
Your second query executes the correlated subquery the same number of times, but it's faster because it's executing against a smaller table.
Generally, the fastest way to do a NOT IN
query is to left join to the "not in" subquery and then omit any rows where the left-joined column is null. This gets rid of the correlated subquery.
SELECT * FROM XSales_Code SC
LEFT JOIN (
SELECT DISTINCT SCode FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
) whatevs ON SC.SCode = whatevs.SCode
WHERE SC.Status = 1
AND SC.Last_Mdt < '2014-01-01'
AND whatevs.SCode IS NULL
ORDER BY Last_Mdt desc
This is hard to explain, but try running the query above without the second-to-last line (AND whatevs.SCode IS NULL
) and you'll see how whatevs.SCODE
has a value when the condition is "IN" and is null when the condition is "NOT IN".
Finally, I want to stress that correlated subqueries aren't inherently evil. Generally they work just fine for an IN
condition and plenty of other use cases, but for a NOT IN
condition they tend to be slow.
Upvotes: 1