jfxu
jfxu

Reputation: 690

mysql select * by index is very slow

I have a table like this:

create table test (
    id int primary key auto_increment,
    idcard varchar(30),
    name varchar(30),
    custom_value varchar(50),
    index i1(idcard)
)

I insert 30,000,000 rows to the table

and then I execute:

  select * from test where idcard='?'

The statement cost 12 seconds to return

when I use iostat to monitor disk

the read speed is about 6 mb/s while the util is 94%

is any way to optimize it?

Upvotes: 2

Views: 2506

Answers (3)

Rick James
Rick James

Reputation: 142298

12 seconds may be realistic.

Assumptions about the question:

  • A total of 30M rows, but only 3000 rows in the resultset.
  • Not enough room to cache things in RAM or you are running from a cold start.
  • InnoDB or MyISAM (the analysis is the same; the details are radically different).
  • Any CHARACTER SET and COLLATION for idcard.
  • INDEX(idcard) exists and is used in the query.
  • HDD disk drive, not SSD.

Here's a breakdown of the processing:

  1. Go to the index, find the first entry with ?, scan forward until hitting an entry that is not ? (about 3K rows later).
  2. For each of those 3K items, reach into the table to find all the columns (cf SELECT *.
  3. Deliver them.

Step 1: Fast.

Step 2: This is (based on the assumption of not being cached) costly. It may involve about 3K disk hits. For an HDD, that would be about 30 seconds. So, 12 seconds could imply some of the stuff was cached or happened to be near each other.

Step 3: This is a network cost, which I am not considering.

Run the query a second time. It may take only 1 second the this time -- because all 3K blocks are cached in RAM! And iostat will show zero activity!

is any way to optimize it?

Well...

  • You already have the best index.
  • What are you going to do with 3000 rows all at once? Is this a one-time task?
  • When using InnoDB, innodb_buffer_pool_size should be about 70% of available RAM, but not so big that it leads to swapping. What is its setting, and how much RAM do you have and what else is running on the machine?
  • Could you do more of the task while you are fetching the 3K rows?
  • Switching to SSDs would help, but I don't like hardware bandaids; they are not reusable.
  • How big is the table (in GB) -- perhaps 3GB data plus index? (SHOW TABLE STATUS.) If you can't make the buffer_pool big enough for it, and you have a variety of queries that compete for different parts of this (and other) tables, then more RAM may be beneficial.

Upvotes: 2

EvE
EvE

Reputation: 750

Seems more like an I/O limitation than something that could be solved by adding indices. What will improve the speed is change the collation of the idcard column to latin1_bin. This uses only 1 byte per character. It also uses binary comparison which is faster than case insensitive comparison.

Only do this if you have no special characters in the idcard column, because the character set of latin1 is quite limited.

ALTER TABLE `test` CHANGE COLUMN `idcard` `idcard` VARCHAR(30) COLLATE 'latin1_bin' AFTER `id`;

Furthermore the ROW_FORMAT=FIXED also improves the speed. ROW_FORMAT=FIXED is not available using the InnoDB engine, but it is with MyISAM. The resulting table I now have is shown below. It's 5 times quicker (80% less time) with select statements than the initial table.

Note that I also changed the collation for 'name' and 'custom_value' to latin1_bin. This does make quite a difference in speed in my test setup, and I'm still figuring out why.

CREATE TABLE `test` (
    `id` INT(11) NOT NULL AUTO_INCREMENT,
    `idcard` VARCHAR(30) COLLATE 'latin1_bin',
    `name` VARCHAR(30) COLLATE 'latin1_bin',
    `custom_value` VARCHAR(50) COLLATE 'latin1_bin',
    PRIMARY KEY (`id`),
    INDEX `i1` (`idcard`)
)
ENGINE=MyISAM
ROW_FORMAT=FIXED ;

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521239

You may try adding the three other columns in the select clause to the index:

CREATE INDEX idx ON test (idcard, id, name, custom_value);

The three columns other than idcard are being added to allow the index to cover everything being selected. The problem with your current index is that it is only on idcard. This means that once MySQL has traversed down to each leaf node in the index, it would have to do another seek back to the clustered index to lookup the values of all columns mentioned in the select *. As a result of this, MySQL may choose to ignore the index completely. The suggestion I made above avoids this additional seek.

Upvotes: -1

Related Questions