KeithS
KeithS

Reputation: 71565

Which is faster - NOT IN or NOT EXISTS?

I have an insert-select statement that needs to only insert rows where a particular identifier of the row does not exist in either of two other tables. Which of the following would be faster?

INSERT INTO Table1 (...)
SELECT (...) FROM Table2 t2
WHERE ...
   AND NOT EXISTS (SELECT 'Y' from Table3 t3 where t2.SomeFK = t3.RefToSameFK)
   AND NOT EXISTS (SELECT 'Y' from Table4 t4 where t2.SomeFK = t4.RefToSameFK AND ...)

... or...

INSERT INTO Table1 (...)
SELECT (...) FROM Table2 t2
WHERE ...
   AND t2.SomeFK NOT IN (SELECT RefToSameFK from Table3)
   AND t2.SomeFK NOT IN (SELECT RefToSameFK from Table4 WHERE ...)

... or do they perform about the same? Additionally, is there any other way to structure this query that would be preferable? I generally dislike subqueries as they add another "dimension" to the query that increases runtime by polynomial factors.

Upvotes: 13

Views: 18016

Answers (4)

David Castro
David Castro

Reputation: 1957

For bigger tables, it's recomended to use NOT EXISTS/EXISTS, because the IN clause runs the subquery a lot of times depending of the architecture of the tables.

Based on cost optimizer:

There is no difference.

Upvotes: 0

gpeche
gpeche

Reputation: 22504

Usually it does not matter if NOT IN is slower / faster than NOT EXISTS, because they are NOT equivalent in presence of NULL. Read:

NOT IN vs NOT EXISTS

In these cases you almost always want NOT EXISTS, because it has the usually expected behaviour.

If they are equivalent, it is likely that your database already has figured that out and will generate the same execution plan for both.

In the few cases where both options are aquivalent and your database is not able to figure that out, it is better to analyze both execution plans and choose the best options for your specific case.

Upvotes: 14

Larry Lustig
Larry Lustig

Reputation: 50970

It's dependent on the size of the tables, the available indices, and the cardinality of those indices.

If you don't get the same execution plan for both queries, and if neither query plans out to perform a JOIN instead of a sub query, then I would guess that version two is faster. Version one is correlated and therefore would produce many more sub queries, version two can be satisfied with three queries total.

(Also, note that different engines may be biased in one direction or another. Some engines may correctly determine that the queries are the same (if they really are the same) and resolve to the same execution plan.)

Upvotes: 1

Mark Sherretta
Mark Sherretta

Reputation: 10230

You could use a LEFT OUTER JOIN and check if the value in the RIGHT table is NULL. If the value is NULL, the row doesn't exist. That is one way to avoid subqueries.

SELECT (...) FROM Table2 t2
LEFT OUTER JOIN t3 ON (t2.someFk = t3.ref)
WHERE t3.someField IS NULL

Upvotes: 1

Related Questions