Reputation: 139
I have 2 tables each has about 230000 records. When I make a query:
select count(*)
from table1
where field1 in (select field2 from table2).
It takes about 0.2 second
.
If I use the same query just changing in
to not in
select count(*)
from table1
where field1 NOT in (select field2 from table2).
It never ends
.
Why ?
Upvotes: 3
Views: 92
Reputation: 73
Its better to user not exists, as not in uses row search which takes too long
Upvotes: 0
Reputation: 11665
Try:
SELECT count(*)
FROM table1 t1
LEFT JOIN table2 t2 ON t1.field1 = t2.field2
WHERE t2.primary_key IS NULL
Upvotes: 0
Reputation: 12843
In worst case both queries can be resolved using two full table scans plus a hash join (semi or anti). We're talking a few seconds for 230 000 rows unless there is something exceptionally going on in your case.
My guess is that either field1
or field2
is nullable. When you use a NOT IN
construct, Oracle has to perform an expensive filter operation which is basically executing the inner query once for each row in the outer table. This is 230 000 full table scans...
You can verify this by looking at the the execution plan. It would look something like:
SELECT
FILTER (NOT EXISTS SELECT 0...)
TABLE ACCESS FULL ...
TABLE ACCESS FULL ...
If there are no NULL values in either column (field1, field2) you can help Oracle with this piece of information so another more efficient execution strategy can be used:
select count(*)
from table1
where field1 is not null
and field1 not in (select field2 from table2 where field2 is not null)
This will generate a plan that looks something like:
SELECT
HASH JOIN ANTI
FULL TABLE SCAN ...
FULL TABLE SCAN ...
...or you can change the construct to NOT EXISTS
(will generate the same plan as above):
select count(*)
from table1
where not exists(
select 'x'
from table2
where table2.field2 = table1.field1
);
Please note that changing from NOT IN
to NOT EXISTS
may change the result of the query. Have a look at the following example and try the two different where-clauses to see the difference:
with table1 as(
select 1 as field1 from dual union all
select null as field1 from dual union all
select 2 as field1 from dual
)
,table2 as(
select 1 as field2 from dual union all
select null as field2 from dual union all
select 3 as field2 from dual
)
select *
from table1
--where field1 not in(select field2 from table2)
where not exists(select 'x' from table2 where field1 = field2)
Upvotes: 0
Reputation: 4081
It's the difference between a scan and a seek. When you ask for "IN" you ask for specifically these values. This means the database engine can use indexes to seek to the correct data pages.
When you ask for "NOT IN" you ask for all values except these values. This means the database engine has to scan the entirety of the table/indexes to find all values.
The other factor is the amount of data. The IN query likely involves much less data and therefore much less I/O than the NOT IN.
Compare it to a phonebook, If you want people only named Smith you can just pick the section for Smith and return that. You don't have to read any pages in the book before or any pages after the Smith section. If you ask for all non-Smith - you have to read all pages before Smith and all after Smith. This illustrates both the seek/scan aspect and the data amount aspect.
Upvotes: 3