Reputation: 105
This is a question to which I've never gotten a definitive answer. I am using MySQL in this example.
Given a reasonably large set of values (lets say 500). Is it quicker to search a table using these values with an IN() clause:
SELECT * FROM table WHERE field IN(values)
Or by creating a temporary table in memory, populating it with the values and joining it to the table being searched:
CREATE TEMPORARY TABLE `temp_table` (`field` varchar(255) NOT NULL) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO temp_table VALUES (values)
SELECT * FROM table t1 JOIN temp_table t2 ON t1.field = t2.field
Both methods will produce same result set.
I have done some of my own basic benchmarking tests and found that when dealing with more than 500 values it becomes quicker to use a temporary table than an IN() clause.
Can someone explain to me the inner workings of MySQL and what a proper answer to this question is?
Thanks, Leo
Upvotes: 4
Views: 1129
Reputation: 62564
From MySql online documentation, IN() :
IN (value,...)
If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search. This means IN is very quick if the IN value list consists entirely of constants. Otherwise, type conversion takes place according to the rules described in Section 11.2, “Type Conversion in Expression Evaluation”, but applied to all the arguments.
Considering that I believe it makes sense to use IN() with a set of constants, otherwise you should use subquery on another table.
You can consider usign EXISTS() instead of JOIN when items are retrieved from an other table, it would significantly faster for large data set
SELECT *
FROM table t1
WHERE EXISTS
(
SELECT *
FROM temp_table t2
WHERE t1.field = t2.field
)
Upvotes: 2
Reputation: 16953
The correct answer depends on many things.
You've done the work already - if your benchmarking tells you that using a temporary table is faster, then that's the way to go.
Remember to benchmark again if you change the hardware, or dramatically alter the schema.
Upvotes: 1