user93865
user93865

Reputation: 139

oracle not in query takes longer than in query

I have 2 tables each has about 230000 records. When I make a query:

select count(*) 
from table1 
where field1 in (select field2 from table2). 

It takes about 0.2 second.

If I use the same query just changing in to not in

select count(*) 
from table1 
where field1 NOT in (select field2 from table2). 

It never ends.

Why ?

Upvotes: 3

Views: 92

Answers (4)

ayman salah
ayman salah

Reputation: 73

Its better to user not exists, as not in uses row search which takes too long

Upvotes: 0

borjab
borjab

Reputation: 11665

Try:

   SELECT count(*) 
     FROM table1 t1
LEFT JOIN table2 t2 ON t1.field1 = t2.field2
    WHERE t2.primary_key IS NULL

Upvotes: 0

Ronnis
Ronnis

Reputation: 12843

In worst case both queries can be resolved using two full table scans plus a hash join (semi or anti). We're talking a few seconds for 230 000 rows unless there is something exceptionally going on in your case.

My guess is that either field1 or field2 is nullable. When you use a NOT IN construct, Oracle has to perform an expensive filter operation which is basically executing the inner query once for each row in the outer table. This is 230 000 full table scans...

You can verify this by looking at the the execution plan. It would look something like:

SELECT
   FILTER (NOT EXISTS SELECT 0...)
      TABLE ACCESS FULL ...
      TABLE ACCESS FULL ...

If there are no NULL values in either column (field1, field2) you can help Oracle with this piece of information so another more efficient execution strategy can be used:

select count(*) 
 from table1 
 where field1 is not null
   and field1 not in (select field2 from table2 where field2 is not null)

This will generate a plan that looks something like:

SELECT 
   HASH JOIN ANTI
      FULL TABLE SCAN ...
      FULL TABLE SCAN ...

...or you can change the construct to NOT EXISTS (will generate the same plan as above):

select count(*) 
  from table1 
 where not exists(
          select 'x' 
            from table2 
           where table2.field2 = table1.field1
       );

Please note that changing from NOT IN to NOT EXISTS may change the result of the query. Have a look at the following example and try the two different where-clauses to see the difference:

with table1 as(
   select 1    as field1 from dual union all
   select null as field1 from dual union all
   select 2    as field1 from dual
)
,table2 as(
   select 1    as field2 from dual union all   
   select null as field2 from dual union all
   select 3    as field2 from dual
)
select * 
  from table1
 --where field1 not in(select field2 from table2)
  where not exists(select 'x' from table2 where field1 = field2)

Upvotes: 0

Allan S. Hansen
Allan S. Hansen

Reputation: 4081

It's the difference between a scan and a seek. When you ask for "IN" you ask for specifically these values. This means the database engine can use indexes to seek to the correct data pages.

When you ask for "NOT IN" you ask for all values except these values. This means the database engine has to scan the entirety of the table/indexes to find all values.

The other factor is the amount of data. The IN query likely involves much less data and therefore much less I/O than the NOT IN.

Compare it to a phonebook, If you want people only named Smith you can just pick the section for Smith and return that. You don't have to read any pages in the book before or any pages after the Smith section. If you ask for all non-Smith - you have to read all pages before Smith and all after Smith. This illustrates both the seek/scan aspect and the data amount aspect.

Upvotes: 3

Related Questions