Atari2600
Atari2600

Reputation: 1259

SQL IN Statement Slowness

I am having some trouble getting my SQL to run efficiently with an IN statement. If I run the two statements separately, and manually paste in the series of results (in this case there are 30 vendor_id's) the vendor_master query runs instantly, and the invoices query runs in about 2 seconds.

select * FROM invoices where vendor_id IN

(

select vendor_id from vendor_master WHERE vendor_master_id = 12345

);

So what causes the HUGE slowdown, more than 60 seconds and often times out? Is there a way to put the results in a variable with commas? Or to get the inner statement to execute firsT?

Upvotes: 1

Views: 117

Answers (4)

Barranka
Barranka

Reputation: 21067

So what causes the HUGE slowdown, more than 60 seconds and often times out?

The IN clause works well when the data set inside the IN condition is "small" and "deterministic". That's because the condition is evaluated once per row. So, assuming that the query in the IN clause returns 100 rows and the table in the FROM clause has 1000 rows, the server will have to perform 100 * 1000 = 100,000 comparissons to filter out your data. Too much effort to filter too little data, don't you think? Of course, if your data sets (both in from and in clauses) are bigger, you can imagine the effect.

By the way, when you use a subquery as an in condition, there's also an additional overhead: the subquery needs to be executed once for each row. So the sequence is something like this:

  • row 1
    • execute subquery
    • check if the value of row 1 matches a value of the result of the subquery
    • if the above is true, keep the row in the result set; exclude it otherwise
  • row 2
    • execute subquery
    • check if the value of row 2 matches a value of the result of the subquery
    • if the above is true, keep the row in the result set; exclude it otherwise
  • ...

Too much work to do, don't you think?


Is there a way to put the results in a variable with commas?

Yes, there's a way... but would you really want to do that? Let's see:

First, create a list with the values you want to filter:

set @valueList = (select group_concat(vendor_id separator ',')
                 from (select vendor_id from vendor_master where vendor_master_id = 12345) as a)

Then, create an SQL expression:

set @sql = concat('select * from invoices where vendor_id in (', @valueList, ')';

Finally, create a prepared statement and execute it:

prepare stmt from @sql;
execute stmt;
-- when you're done, don't forget to deallocate the statement:
-- deallocate prepare stmt;

I ask you again: do you really want to do all this?


Or to get the inner statement to execute first?

All the other answers are pointing you in the right direction: instead of using in use inner join:

select i.*
from invoices as i
     inner join (
         select distinct vendor_id 
         from vendor_master
         where vendor_master_id = 12345
     ) as vm on i.vendor_id = vm.vendor_id;

If, for some reason, this still is too slow, the only alternative that comes to my mind is: Create a temporary table (sort of "divide-and-conquer strategy"):

drop table if exists temp_vm;
create temporary table temp_vm
    select distinct vendor_id
    from vendor_master
    where vendor_master_id = 12345;
alter table temp_vm
    add index vi(vendor_id);
select i.*
from invoices as i inner join temp_vm as vm on i.vendor_id = vm.vendor_id;

Remember: temp tables are only visible to the connection that creates them, and are dropped when the connection is closed or terminated.


In any case, your performance will be improved if you ensure that your tables are properly indexed; specifically, you need to ensure that invoices.vendor_id and vendor_master.vendor_master_id` are indexed.

Upvotes: 1

Mohammad
Mohammad

Reputation: 11

You can use JOIN with DISTINCT instead of IN:

SELECT *
FROM invoices JOIN 
(
    SELECT DISTINCT vendor_id as vid
    FROM vendor_master
    WHERE vendor_master_id = 12345
) vmi
ON invoices.vendor_in = vmi.vid

Remember that you have to have the DISTINCT, otherwise if there are two records for the inner query, than you will have repeated rows after the JOIN, and the result will be different from IN query.

Upvotes: 1

xdazz
xdazz

Reputation: 160963

You could try using INNER JOIN:

select i.* 
FROM invoices i
INNER JOIN vendor_master vm 
        ON i.vendor_id = vm.vendor_id AND vm.vendor_master_id = 12345

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1271131

Prior to MySQL 5.6.6, in was optimized quite inefficiently. Use exists instead:

select *
FROM invoices i
where exists (select 1
              from vendor_master vm
              where i.vendor_id = vm.vendor_id and vm.vendor_master_id = 12345
             );

For best performance, you want an index on vendor_master(vendor_id, vendor_master_id).

Upvotes: 3

Related Questions