pixelfreak
pixelfreak

Reputation: 17834

MySQL WHERE NOT IN extremely slow

Below is a SQL statement inside a stored procedure (truncated for brevity):

SELECT * 
FROM item a 
WHERE a.orderId NOT IN (SELECT orderId FROM table_excluded_item);

This statement takes 30 seconds or so! But if I remove the inner SELECT query, it drops to 1s. table_excluded_item is not huge, but I suspect the inner query is being executed more than it needs to be.

Is there a more efficient way of doing this?

Upvotes: 17

Views: 15821

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269463

The problem with the left join approach is that duplicate records might be processed in generating the output. Sometimes, this is not the case . . . according to this article, MySQL does optimize the left outer join correctly when the columns are indexed, even in the presence of duplicates. I admit to remaining skeptical, though, that this optimization always happens.

MySQL sometimes has problems optimizing IN statements with a subquery. The best fix is a correlated subquery:

SELECT * 
FROM item a 
WHERE not exists (select 1
                  from table_excluded_item tei
                  where tei.orderid = a.orderid
                  limit 1
                 )

If you have an index on table_excluded_item.orderid, then this will scan the index and stop at the first value (the limit 1 may not be strictly necessary for this). This is the fastest and safest way to implement what you want in MySQL.

Upvotes: 5

Anthony
Anthony

Reputation: 37045

Try this out and compare to the LEFT JOIN query time:

SELECT * 
FROM item a 
HAVING orderId NOT IN (SELECT orderId FROM table_excluded_item);

This is frowned upon (using HAVING when WHERE can be used) since HAVING assumes that the limiting condition (orderId) is part of the result set. But I think in this scenarios it makes more sense (since it is part of the result set) and because it is clearer what is going on than the LEFT JOIN approach.

It may actually be a bit slower, but post the results so we know if it's any better than your original query.

Upvotes: 1

John Woo
John Woo

Reputation: 263683

use LEFT JOIN

SELECT  a.* 
FROM    item a 
        LEFT JOIN table_excluded_item b
            ON a.orderId = b.orderId
WHERE   b.orderId IS NULL

make sure that orderId from both tables has been indexed.

Upvotes: 23

Related Questions