Reputation: 1259
I am having some trouble getting my SQL to run efficiently with an IN statement. If I run the two statements separately, and manually paste in the series of results (in this case there are 30 vendor_id's) the vendor_master query runs instantly, and the invoices query runs in about 2 seconds.
select * FROM invoices where vendor_id IN
(
select vendor_id from vendor_master WHERE vendor_master_id = 12345
);
So what causes the HUGE slowdown, more than 60 seconds and often times out? Is there a way to put the results in a variable with commas? Or to get the inner statement to execute firsT?
Upvotes: 1
Views: 117
Reputation: 21067
So what causes the HUGE slowdown, more than 60 seconds and often times out?
The IN
clause works well when the data set inside the IN
condition is "small" and "deterministic". That's because the condition is evaluated once per row. So, assuming that the query in the IN
clause returns 100 rows and the table in the FROM
clause has 1000 rows, the server will have to perform 100 * 1000 = 100,000
comparissons to filter out your data. Too much effort to filter too little data, don't you think? Of course, if your data sets (both in from
and in
clauses) are bigger, you can imagine the effect.
By the way, when you use a subquery as an in
condition, there's also an additional overhead: the subquery needs to be executed once for each row. So the sequence is something like this:
Too much work to do, don't you think?
Is there a way to put the results in a variable with commas?
Yes, there's a way... but would you really want to do that? Let's see:
First, create a list with the values you want to filter:
set @valueList = (select group_concat(vendor_id separator ',')
from (select vendor_id from vendor_master where vendor_master_id = 12345) as a)
Then, create an SQL expression:
set @sql = concat('select * from invoices where vendor_id in (', @valueList, ')';
Finally, create a prepared statement and execute it:
prepare stmt from @sql;
execute stmt;
-- when you're done, don't forget to deallocate the statement:
-- deallocate prepare stmt;
I ask you again: do you really want to do all this?
Or to get the inner statement to execute first?
All the other answers are pointing you in the right direction: instead of using in
use inner join
:
select i.*
from invoices as i
inner join (
select distinct vendor_id
from vendor_master
where vendor_master_id = 12345
) as vm on i.vendor_id = vm.vendor_id;
If, for some reason, this still is too slow, the only alternative that comes to my mind is: Create a temporary table (sort of "divide-and-conquer strategy"):
drop table if exists temp_vm;
create temporary table temp_vm
select distinct vendor_id
from vendor_master
where vendor_master_id = 12345;
alter table temp_vm
add index vi(vendor_id);
select i.*
from invoices as i inner join temp_vm as vm on i.vendor_id = vm.vendor_id;
Remember: temp tables are only visible to the connection that creates them, and are dropped when the connection is closed or terminated.
In any case, your performance will be improved if you ensure that your tables are properly indexed; specifically, you need to ensure that invoices.vendor_id
and vendor_master.vendor_master_id` are indexed.
Upvotes: 1
Reputation: 11
You can use JOIN
with DISTINCT
instead of IN
:
SELECT *
FROM invoices JOIN
(
SELECT DISTINCT vendor_id as vid
FROM vendor_master
WHERE vendor_master_id = 12345
) vmi
ON invoices.vendor_in = vmi.vid
Remember that you have to have the DISTINCT
, otherwise if there are two records for the inner query, than you will have repeated rows after the JOIN
, and the result will be different from IN
query.
Upvotes: 1
Reputation: 160963
You could try using INNER JOIN
:
select i.*
FROM invoices i
INNER JOIN vendor_master vm
ON i.vendor_id = vm.vendor_id AND vm.vendor_master_id = 12345
Upvotes: 1
Reputation: 1271131
Prior to MySQL 5.6.6, in
was optimized quite inefficiently. Use exists
instead:
select *
FROM invoices i
where exists (select 1
from vendor_master vm
where i.vendor_id = vm.vendor_id and vm.vendor_master_id = 12345
);
For best performance, you want an index on vendor_master(vendor_id, vendor_master_id)
.
Upvotes: 3