Reputation: 17834
Below is a SQL statement inside a stored procedure (truncated for brevity):
SELECT *
FROM item a
WHERE a.orderId NOT IN (SELECT orderId FROM table_excluded_item);
This statement takes 30 seconds or so! But if I remove the inner SELECT query, it drops to 1s. table_excluded_item
is not huge, but I suspect the inner query is being executed more than it needs to be.
Is there a more efficient way of doing this?
Upvotes: 17
Views: 15821
Reputation: 1269463
The problem with the left join approach is that duplicate records might be processed in generating the output. Sometimes, this is not the case . . . according to this article, MySQL does optimize the left outer join
correctly when the columns are indexed, even in the presence of duplicates. I admit to remaining skeptical, though, that this optimization always happens.
MySQL sometimes has problems optimizing IN
statements with a subquery. The best fix is a correlated subquery:
SELECT *
FROM item a
WHERE not exists (select 1
from table_excluded_item tei
where tei.orderid = a.orderid
limit 1
)
If you have an index on table_excluded_item.orderid, then this will scan the index and stop at the first value (the limit 1
may not be strictly necessary for this). This is the fastest and safest way to implement what you want in MySQL.
Upvotes: 5
Reputation: 37045
Try this out and compare to the LEFT JOIN
query time:
SELECT *
FROM item a
HAVING orderId NOT IN (SELECT orderId FROM table_excluded_item);
This is frowned upon (using HAVING
when WHERE
can be used) since HAVING
assumes that the limiting condition (orderId
) is part of the result set. But I think in this scenarios it makes more sense (since it is part of the result set) and because it is clearer what is going on than the LEFT JOIN
approach.
It may actually be a bit slower, but post the results so we know if it's any better than your original query.
Upvotes: 1
Reputation: 263683
use LEFT JOIN
SELECT a.*
FROM item a
LEFT JOIN table_excluded_item b
ON a.orderId = b.orderId
WHERE b.orderId IS NULL
make sure that orderId
from both tables has been indexed.
Upvotes: 23