Reputation: 27536
I have the following SQL:
IF EXISTS
(
SELECT
1
FROM
SomeTable T1
WHERE
SomeField = 1
AND SomeOtherField = 1
AND NOT EXISTS(SELECT 1 FROM SomeOtherTable T2 WHERE T2.KeyField = T1.KeyField)
)
RAISERROR ('Blech.', 16, 1)
The SomeTable
table has around 200,000 rows, and the SomeOtherTable
table has about the same.
If I execute the inner SQL (the SELECT
), it executes in sub-second time, returning no rows. But, if I execute the entire script (IF...RAISERROR
) then it takes well over an hour. Why?
Now, obviously, the execution plan is different - I can see that in Enterprise Manager - but again, why?
I could probably do something like SELECT @num = COUNT(*) WHERE
... and then IF @num > 0 RAISERROR
but... I think that's missing the point somewhat. You can only code around a bug (and it sure looks like a bug to me) if you know that it exists.
EDIT:
I should mention that I already tried re-jigging the query into an OUTER JOIN as per @Bohemian's answer, but this made no difference to the execution time.
EDIT 2:
I've attached the query plan for the inner SELECT
statement:
... and the query plan for the whole IF...RAISERROR
block:
Obviously these show the real table/field names, but apart from that the query is exactly as shown above.
Upvotes: 11
Views: 1026
Reputation: 171206
The IF
does not magically turn off optimizations or damage the plan. The optimizer just noticed that EXISTS
only needs one row at most (like a TOP 1
). This is called a "row goal" and it normally happens when you do paging. But also with EXISTS
, IN
, NOT IN
and such things.
My guess: if you write TOP 1
to the original query you get the same (bad) plan.
The optimizer tries to be smart here and only produce the first row using much cheaper operations. Unfortunately, it misestimates cardinality. It guesses that the query will produce lots of rows although in reality it produces none. If it estimated correctly you'd just get a more efficient plan, or it would not do the transformation at all.
I suggest the following steps:
IF (SELECT COUNT(*) FROM ...) > 0
which will give the original plan because the optimizer does not have a row goal.Upvotes: 6
Reputation: 812
Please try SELECT TOP 1 KeyField
. Using primary key will work faster in my guess.
NOTE: I posted this as answer as I couldn't comment.
Upvotes: 0
Reputation: 425198
It's probably because the optimizer can figure out how to turn your query into a more efficient query, but somehow the IF prevents that. Only an EXPLAIN will tell you why the query is taking so long, but I can tell you how to make this whole thing more efficient... Indtead of using a correlated subquery, which is incredibly inefficient - you get "n" subqueries run for "n" rows in the main table - use a JOIN.
Try this:
IF EXISTS (
SELECT 1
FROM SomeTable T1
LEFT JOIN SomeOtherTable T2 ON T2.KeyField = T1.KeyField
WHERE SomeField = 1
AND SomeOtherField = 1
AND T2.KeyField IS NULL
) RAISERROR ('Blech.', 16, 1)
The "trick" here is to use s LEFT JOIN and filter out all joined rows by testing for a null in the WHERE clause, which is executed after the join is made.
Upvotes: 2