Reputation: 63104
I'm using Hibernate's JPA implementation with MySQL 5.0.67. MySQL is configured to use InnoDB.
In performing a JPA query (which is translated to SQL), I've discovered that using the IN
clause is slower than performing individual queries. Example:
SELECT p FROM Person p WHERE p.name IN ('Joe', 'Jane', 'Bob', 'Alice')
is slower than four separate queries:
SELECT p FROM Person p WHERE p.name = 'Joe'
SELECT p FROM Person p WHERE p.name = 'Jane'
SELECT p FROM Person p WHERE p.name = 'Bob'
SELECT p FROM Person p WHERE p.name = 'Alice'
Why is this? Is this a MySQL performance limitation?
Upvotes: 5
Views: 6241
Reputation: 39
you can make the in clause faster if you get the values first then embed the values into the in clause instead of embedding the sql query into the sql statement
here is an example of using in clause
Upvotes: 0
Reputation: 7729
For me because the IN clause can free the database and tables up to be used by other connections, and there are application structure benefit to using it, the IN clause is an invaluable tool, even if there is a slight lag over individual queries.
The following technique is utilized in almost every PHP/MySQL application I construct.
I use the IN clause quite a bit with numerical keys:
e.g.
grab five master items and all subites could be:
$master_arr = mysql_query(
select * from master table where master_id in (1,7,9,10)
);
then:
$subitem_arr = mysql_query(
select * from subitems table where par_master_id in (1,7,9,10)
);
the add the subarray to the master items:
foreach($subitem_arr AS $sv){
$m_key = $sv['par_master_id'];
$s_key = $sv['subitem_id'];
$master_arr[$m_key]['subitem'][$s_key] = $sv;
}
This does two things: 1.) the tables are not all held at once with a join 2.) only two mysql queries produce a tree of data
Upvotes: 0
Reputation: 7144
A query as simple as the IN demonstrated shouldn't have an issue with the optimizer choosing to use the index. The UNION work mentioned by Bill is only required occasionally when you have more complex queries. It could be an issue with index statistics.
Have you done an ANALYZE on the table in question?
How many rows are in the table and how many match the IN clause?
What does EXPLAIN say for the queries in question?
Upvotes: 1
Reputation: 562681
This is a known deficiency in MySQL.
It is often true that using UNION
performs better than a range query like the one you show. MySQL doesn't employ indexes very intelligently for expressions using IN (...)
. A similar hole exists in the optimizer for boolean expressions with OR
.
See http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/ for some explanation and detailed benchmarks.
The optimizer is being improved all the time. A deficiency in one version of MySQL may be improved in a subsequent version. So it's worth testing your queries on different versions.
It is also advantageous to use UNION ALL
instead of simply UNION
. Both queries use a temporary table to store results, but the difference is that UNION
applies DISTINCT
to the result set, which incurs an additional un-indexed sort.
Upvotes: 11
Reputation: 20624
Are you measuring wall-clock time or query execution time? My guess is that the actual execution time for each of the four individual queries may add up to less than the time to execute the IN query, but the overall wall-clock time will be much longer for the four queries.
It will help to have an index on the name column.
Upvotes: 0
Reputation: 1306
If you're using the IN operator, it's not much different than saying:
(p.name = 'Joe' OR p.name = 'Jane' OR p.name = 'Bob' OR p.name = 'Alice')
Those are four conditions which must be checked for every row that the query must consider. Of course, each other query you cite has only one condition. I don't believe in most real-world scenarios doing four such queries would be faster, since you have to consider the time it takes for your client to read the result sets and do something with them. In that case, IN looks pretty nice; even better if it can use an index.
Upvotes: 1