Reputation: 63104

SQL IN clause slower than individual queries

I'm using Hibernate's JPA implementation with MySQL 5.0.67. MySQL is configured to use InnoDB.

In performing a JPA query (which is translated to SQL), I've discovered that using the IN clause is slower than performing individual queries. Example:

SELECT p FROM Person p WHERE p.name IN ('Joe', 'Jane', 'Bob', 'Alice')

is slower than four separate queries:

SELECT p FROM Person p WHERE p.name = 'Joe'
SELECT p FROM Person p WHERE p.name = 'Jane'
SELECT p FROM Person p WHERE p.name = 'Bob'
SELECT p FROM Person p WHERE p.name = 'Alice'

Why is this? Is this a MySQL performance limitation?

Upvotes: 5

Answers (6)

Abdelmomen

Reputation: 39

you can make the in clause faster if you get the values first then embed the values into the in clause instead of embedding the sql query into the sql statement

here is an example of using in clause

Upvotes: 0

Fire Crow

Reputation: 7729

For me because the IN clause can free the database and tables up to be used by other connections, and there are application structure benefit to using it, the IN clause is an invaluable tool, even if there is a slight lag over individual queries.

The following technique is utilized in almost every PHP/MySQL application I construct.

I use the IN clause quite a bit with numerical keys:

e.g.

grab five master items and all subites could be:

$master_arr = mysql_query(
 select * from master table where master_id in (1,7,9,10)
);

then:

$subitem_arr = mysql_query(
   select * from subitems table where par_master_id in (1,7,9,10)
);

the add the subarray to the master items:

foreach($subitem_arr AS $sv){
     $m_key = $sv['par_master_id'];
     $s_key = $sv['subitem_id'];
     $master_arr[$m_key]['subitem'][$s_key] = $sv;
}

This does two things: 1.) the tables are not all held at once with a join 2.) only two mysql queries produce a tree of data

Upvotes: 0

Harrison Fisk

Reputation: 7144

A query as simple as the IN demonstrated shouldn't have an issue with the optimizer choosing to use the index. The UNION work mentioned by Bill is only required occasionally when you have more complex queries. It could be an issue with index statistics.

Have you done an ANALYZE on the table in question?

How many rows are in the table and how many match the IN clause?

What does EXPLAIN say for the queries in question?

Upvotes: 1

Bill Karwin

Reputation: 562681

This is a known deficiency in MySQL.

It is often true that using UNION performs better than a range query like the one you show. MySQL doesn't employ indexes very intelligently for expressions using IN (...). A similar hole exists in the optimizer for boolean expressions with OR.

See http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/ for some explanation and detailed benchmarks.

The optimizer is being improved all the time. A deficiency in one version of MySQL may be improved in a subsequent version. So it's worth testing your queries on different versions.

It is also advantageous to use UNION ALL instead of simply UNION. Both queries use a temporary table to store results, but the difference is that UNION applies DISTINCT to the result set, which incurs an additional un-indexed sort.

Upvotes: 11

Barry Brown

Reputation: 20624

Are you measuring wall-clock time or query execution time? My guess is that the actual execution time for each of the four individual queries may add up to less than the time to execute the IN query, but the overall wall-clock time will be much longer for the four queries.

It will help to have an index on the name column.

Upvotes: 0

Peter

Reputation: 1306

If you're using the IN operator, it's not much different than saying:

(p.name = 'Joe' OR p.name = 'Jane' OR p.name = 'Bob' OR p.name = 'Alice')

Those are four conditions which must be checked for every row that the query must consider. Of course, each other query you cite has only one condition. I don't believe in most real-world scenarios doing four such queries would be faster, since you have to consider the time it takes for your client to read the result sets and do something with them. In that case, IN looks pretty nice; even better if it can use an index.

Upvotes: 1

SQL IN clause slower than individual queries

Answers (6)

Related Questions