Ilyas Oirraq
Ilyas Oirraq

Reputation: 43

MYSQL (NOT IN) query is very slow

i don't know why this query take 45 min to give me the result

table1= 831293 row

table2= 2000.000 row

SELECT ID, name FROM table1 WHERE name not IN (SELECT name FROM table2 )  LIMIT 831293

my.cnf file look like this :

max_allowed_packet  = 64M
innodb_buffer_pool_size = 8G
innodb_log_file_size    = 256M 
query_cache_type        = 1
query_cache_limit   = 1M

Upvotes: 1

Views: 2525

Answers (4)

matt stucky
matt stucky

Reputation: 165

agreed on: (a) add the indexes and (b) the limit clause is useless.

consider a 'left outer join' - it will bring all rows from the "left" table even where no join value in the "right" table exists - then you filter out the non-desired "right" values.

something like:

SELECT t1.ID, t1.name 
FROM table1 t1 left outer join table2 t2 
on t1.name = t2.name
where t2.name is null;

note: sql above assumes name is unique in t2, and not unique in t1.

Upvotes: 1

VJ Hil
VJ Hil

Reputation: 904

I think using Exist will be faster.Try This

select Id,name from
(
    select table1.ID as ID, table1.name as name  
    FROM table1 inner join table2 
    on table1.name=table2.name
)
    WHERE not EXISTS
    (
    select 1  
    FROM table1 inner join table2 
    on table1.name=table2.name);

Upvotes: 0

Pinna_be
Pinna_be

Reputation: 4617

Do you have an index on name?

If you have to loop each time through table2 per row, you effectively loop through 831293*200000 rows, which is a lot.

When you have an index on name, you can get a major performance increase, as it is easier to search for a name in this index. (probably even O(1) as there will probably be a hashtable)

you can do this as follows

ALTER TABLE `table2` ADD INDEX (`name`)

Upvotes: 1

Michal Krasny
Michal Krasny

Reputation: 5916

This select is very expensive because in the worst case for every record in table1 you must go through 2.000.000 records in table2. I am not MySQL specialist, but on the Oracle platform I would consider indexing and table spaces for table2.

BTW, the LIMIT part is useless.

Upvotes: 1

Related Questions