Reputation: 27536

Why does IF (query) take over an hour, when query takes less than a second?

I have the following SQL:

 IF EXISTS
 (
    SELECT
        1
    FROM
        SomeTable T1
    WHERE
        SomeField = 1
    AND SomeOtherField = 1
    AND NOT EXISTS(SELECT 1 FROM SomeOtherTable T2 WHERE T2.KeyField = T1.KeyField)
)
    RAISERROR ('Blech.', 16, 1)

The SomeTable table has around 200,000 rows, and the SomeOtherTable table has about the same.

If I execute the inner SQL (the SELECT), it executes in sub-second time, returning no rows. But, if I execute the entire script (IF...RAISERROR) then it takes well over an hour. Why?

Now, obviously, the execution plan is different - I can see that in Enterprise Manager - but again, why?

I could probably do something like SELECT @num = COUNT(*) WHERE ... and then IF @num > 0 RAISERROR but... I think that's missing the point somewhat. You can only code around a bug (and it sure looks like a bug to me) if you know that it exists.

EDIT:

I should mention that I already tried re-jigging the query into an OUTER JOIN as per @Bohemian's answer, but this made no difference to the execution time.

EDIT 2:

I've attached the query plan for the inner SELECT statement:

Query Plan - inner SELECT statement

... and the query plan for the whole IF...RAISERROR block:

Query Plan - whole IF statement

Obviously these show the real table/field names, but apart from that the query is exactly as shown above.

Upvotes: 11

Answers (3)

usr

Reputation: 171206

The IF does not magically turn off optimizations or damage the plan. The optimizer just noticed that EXISTS only needs one row at most (like a TOP 1). This is called a "row goal" and it normally happens when you do paging. But also with EXISTS, IN, NOT IN and such things.

My guess: if you write TOP 1 to the original query you get the same (bad) plan.

The optimizer tries to be smart here and only produce the first row using much cheaper operations. Unfortunately, it misestimates cardinality. It guesses that the query will produce lots of rows although in reality it produces none. If it estimated correctly you'd just get a more efficient plan, or it would not do the transformation at all.

I suggest the following steps:

fix the plan by reviewing indexes and statistics
if this didn't help, change the query to IF (SELECT COUNT(*) FROM ...) > 0 which will give the original plan because the optimizer does not have a row goal.

Upvotes: 6

Maximus

Reputation: 812

Please try SELECT TOP 1 KeyField. Using primary key will work faster in my guess.

NOTE: I posted this as answer as I couldn't comment.

Upvotes: 0

Bohemian

Reputation: 425198

It's probably because the optimizer can figure out how to turn your query into a more efficient query, but somehow the IF prevents that. Only an EXPLAIN will tell you why the query is taking so long, but I can tell you how to make this whole thing more efficient... Indtead of using a correlated subquery, which is incredibly inefficient - you get "n" subqueries run for "n" rows in the main table - use a JOIN.

Try this:

IF EXISTS (
  SELECT 1
  FROM SomeTable T1
  LEFT JOIN SomeOtherTable T2 ON T2.KeyField = T1.KeyField
  WHERE SomeField = 1
  AND SomeOtherField = 1
  AND T2.KeyField IS NULL
) RAISERROR ('Blech.', 16, 1)

The "trick" here is to use s LEFT JOIN and filter out all joined rows by testing for a null in the WHERE clause, which is executed after the join is made.

Upvotes: 2

Why does IF (query) take over an hour, when query takes less than a second?

Answers (3)

Related Questions