Optimize query with two "in" clauses

Question

I've trying to optimize a query on MySQL that takes around 15-20 seconds to run. My Data table has about 10M rows, and the query is trying to return 68,000 records which match 144 "run" fields and 35 "name" fields. Because the query is using two in clauses, my indexes don't seem to be terribly helpful.

Here's the query:

select * from data d where 
d.data_type='Result' and 
(d.run in ('8a7aee1f2a6232b1012a624da9201b92', '8a7aee1f2a6232b1012a625432a314ef' , 

... [144 runs]

)) and (d.name like 'itema[%]' or d.name like 'itemb[%]')

Here's the table definition

CREATE TABLE `data` (
  `data_type` varchar(31) NOT NULL,
  `id` char(32) NOT NULL,
  `entry_time` datetime default NULL,
  `name` varchar(255) NOT NULL,
  `step` int(11) default NULL,
  `value` double NOT NULL,
  `run` char(32) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `FK2EEFAA8ECCC6F3` (`run`),
  KEY `data2` (`run`,`step`),
  KEY `data3` (`data_type`,`name(10)`,`run`),
  CONSTRAINT `FK2EEFAA8ECCC6F3` FOREIGN KEY (`run`) REFERENCES `run_archive` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Explain tells me the query is using key data3.

id      select_type     table   type    possible_keys   key     key_len ref    rows    Extra
1       SIMPLE  d    range   FK2EEFAA8ECCC6F3,data2,data3    data3    223     NULL    113271  Using where

I used to run 144 queries (one for each run). It seems about twice as fast to do one query, but still way too slow.

Suggestions for optimizing? Ideas I have are:

Finding a magic index that speeds
this up
Denormalizing data (it'd be easy to get rid of the run, but
harder for the name)
Splitting up the data among different tables (hard to do with my Java/Hibernate approach)

Or am I just asking the impossible here?

Edit: it turned out the biggest fix was to increase the size of my innodb_buffer_pool. The query went down to about 1 and a half seconds after doing this. I've marked as "answer" a fix that improved it slightly more.

meriton · Accepted Answer

Depending on how selective the condition on run is, it might be better to provide the index

data_type, run, name(10)

The trouble with providing the column used for prefix matching early in the index is that it scatters matching rows across the index, requiring a bigger part of the index to be read from disk.

Also, using a smaller datatype for the id of run will reduce index size and speed up comparisions. This is a constant factor improvement, but might be worthwhile regardless.

Optimize query with two "in" clauses

Answers (2)

Related Questions

Optimize query with two &quot;in&quot; clauses

Answers (2)

Related Questions

Optimize query with two "in" clauses