Will Glass
Will Glass

Reputation: 4850

Optimize query with two "in" clauses

I've trying to optimize a query on MySQL that takes around 15-20 seconds to run. My Data table has about 10M rows, and the query is trying to return 68,000 records which match 144 "run" fields and 35 "name" fields. Because the query is using two in clauses, my indexes don't seem to be terribly helpful.

Here's the query:

select * from data d where 
d.data_type='Result' and 
(d.run in ('8a7aee1f2a6232b1012a624da9201b92', '8a7aee1f2a6232b1012a625432a314ef' , 

... [144 runs]

)) and (d.name like 'itema[%]' or d.name like 'itemb[%]')

Here's the table definition

CREATE TABLE `data` (
  `data_type` varchar(31) NOT NULL,
  `id` char(32) NOT NULL,
  `entry_time` datetime default NULL,
  `name` varchar(255) NOT NULL,
  `step` int(11) default NULL,
  `value` double NOT NULL,
  `run` char(32) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `FK2EEFAA8ECCC6F3` (`run`),
  KEY `data2` (`run`,`step`),
  KEY `data3` (`data_type`,`name(10)`,`run`),
  CONSTRAINT `FK2EEFAA8ECCC6F3` FOREIGN KEY (`run`) REFERENCES `run_archive` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Explain tells me the query is using key data3.

id      select_type     table   type    possible_keys   key     key_len ref    rows    Extra
1       SIMPLE  d    range   FK2EEFAA8ECCC6F3,data2,data3    data3    223     NULL    113271  Using where

I used to run 144 queries (one for each run). It seems about twice as fast to do one query, but still way too slow.

Suggestions for optimizing? Ideas I have are:

Or am I just asking the impossible here?

Edit: it turned out the biggest fix was to increase the size of my innodb_buffer_pool. The query went down to about 1 and a half seconds after doing this. I've marked as "answer" a fix that improved it slightly more.

Upvotes: 2

Views: 169

Answers (2)

meriton
meriton

Reputation: 70564

Depending on how selective the condition on run is, it might be better to provide the index

data_type, run, name(10)

The trouble with providing the column used for prefix matching early in the index is that it scatters matching rows across the index, requiring a bigger part of the index to be read from disk.

Also, using a smaller datatype for the id of run will reduce index size and speed up comparisions. This is a constant factor improvement, but might be worthwhile regardless.

Upvotes: 0

p.campbell
p.campbell

Reputation: 100567

Consider splitting result records away from the data table? I didn't catch what percentage your result is, but perhaps it's worth benchmarking in a Dev copy of your Prod database.

Can you FK those run values? If they're reusable(?), perhaps create a Run table? My guesstimate is that 144 string matches, even indexed, is slower that if they were int or smallint. Again, benchmarking this suggestion, or any suggestion, will obviously prove the theory.

What does the query plan difference look like when not including your like clause on the name attribute?

Upvotes: 1

Related Questions