man
man

Reputation: 498

How to properly use indexing in MySQL

I'm running a fairly simple auto catalog

CREATE TABLE catalog_auto (
    id INT(10) UNSIGNED NOT NULL auto_increment,
    make varchar(35),
    make_t varchar(35),
    model varchar(40),
    model_t varchar(40),
    model_year SMALLINT(4) UNSIGNED,
    fuel varchar(35),
    gearbox varchar(15),
    wd varchar(5),
    engine_cc SMALLINT(4) UNSIGNED,
    variant varchar(40),
    body varchar(30),
    power_ps SMALLINT(4) UNSIGNED,
    power_kw SMALLINT(4) UNSIGNED,
    power_hp SMALLINT(4) UNSIGNED,
    max_rpm SMALLINT(5) UNSIGNED,
    torque SMALLINT(5) UNSIGNED,
    top_spd SMALLINT(5) UNSIGNED,
    seats TINYINT(2) UNSIGNED,
    doors TINYINT(1) UNSIGNED,
    weight_kg SMALLINT(5) UNSIGNED,
    lkm_def TINYINT(3) UNSIGNED,
    lkm_mix TINYINT(3) UNSIGNED,
    lkm_urb TINYINT(3) UNSIGNED,
    tank_cap TINYINT(3) UNSIGNED,
    co2 SMALLINT(5) UNSIGNED,
    PRIMARY KEY(id),
    INDEX `gi`(`make`,`model`,`model_year`,`fuel`,`gearbox`,`wd`,`engine_cc`),
    INDEX `mkt`(`make`,`make_t`),
    INDEX `mdt`(`make`,`model`,`model_t`)
);

The table has about 60.000 rows so far, so, nothing that simple queries, even without indexes, couldn't handle.

The point is, i'm trying to get the hang of using indexes, so i made a few, based on my most frequent queries.

Say i want engine_cc for a specific set of criteria like so:

SELECT DISTINCT engine_cc FROM catalog_auto WHERE make='audi' AND model='a4' and model_year=2006 AND fuel='diesel' AND gearbox='manual' AND wd='front';

EXPLAIN says:

+----+-------------+--------------+------+---------------+------+---------+-------------------------------------+------+--------------------------+
| id | select_type | table        | type | possible_keys | key  | key_len | ref                                 | rows | Extra                    |
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------+------+--------------------------+
|  1 | SIMPLE      | catalog_auto | ref  | gi,mkt,mdt    | gi   | 408     | const,const,const,const,const,const |    8 | Using where; Using index |
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------+------+--------------------------+

The query is using gi index as expected, no problem here.

After selecting base criteria, i need the rest of the columns as well:

SELECT * FROM catalog_auto WHERE make='audi' AND model='a4' and model_year=2006 AND fuel='diesel' AND gearbox='manual' AND wd='front' AND engine_cc=1968;

EXPLAIN says:

+----+-------------+--------------+------+---------------+------+---------+-------------------------------------------+------+-------------+
| id | select_type | table        | type | possible_keys | key  | key_len | ref                                       | rows | Extra       |
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------------+------+-------------+
|  1 | SIMPLE      | catalog_auto | ref  | gi,mkt,mdt    | gi   | 411     | const,const,const,const,const,const,const |    3 | Using where |
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------------+------+-------------+

It selected a KEY but NOT using the index. The query however, is very fast(1 row in set (0.00 sec)), but since the table doesn't have that many rows, i assume even without indexing, it would be the same.

Tried it like this:

SELECT * FROM catalog_auto WHERE id IN (SELECT id FROM catalog_auto WHERE make='audi' AND model='a6' AND model_year=2009);

Again, in EXPLAIN:

+----+--------------------+--------------+-----------------+--------------------+---------+---------+------+-------+-------------+
| id | select_type        | table        | type            | possible_keys      | key     | key_len | ref  | rows  | Extra       |
+----+--------------------+--------------+-----------------+--------------------+---------+---------+------+-------+-------------+
|  1 | PRIMARY            | catalog_auto | ALL             | NULL               | NULL    | NULL    | NULL | 59060 | Using where |
|  2 | DEPENDENT SUBQUERY | catalog_auto | unique_subquery | PRIMARY,gi,mkt,mdt | PRIMARY | 4       | func |     1 | Using where |
+----+--------------------+--------------+-----------------+--------------------+---------+---------+------+-------+-------------+

Still NOT using any index, not even PRIMARY KEY. Shouldn't this, at least use the PRIMARY KEY?

Documentation says: MySQL can ignore a key even if it finds one, if it determines that a full table scan would be faster, depending on the query.

Is that the reason why it's not using any of the indexes? Is this a good practice? If not, how would you recommend indexing columns, for a SELECT * statement, to always use an index, given the above query.

I'm not much of a MySQL expert, so any pointers would be greatly appreciated.

Using MySQL 5.5 with InnoDB.

Upvotes: 2

Views: 374

Answers (2)

Bill Karwin
Bill Karwin

Reputation: 562320

I'm basically saying the same answer that @DStanley said, but I want to expand on it more than I can fit in a comment.

The "Using index" note means that the query is using only the index to get the columns it needs.
The absence of this note doesn't mean the query isn't using an index.

What you should look at is the key column in the EXPLAIN report:

+----+-------------+--------------+------+---------------+------+---------+-------------------------------------------+------+-------------+
| id | select_type | table        | type | possible_keys | key  | key_len | ref                                       | rows | Extra       |
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------------+------+-------------+
|  1 | SIMPLE      | catalog_auto | ref  | gi,mkt,mdt    | gi   | 411     | const,const,const,const,const,const,const |    3 | Using where |
+----+-------------+--------------+------+---------------+------+---------+-------------------------------------------+------+-------------+

The key column says the optimizer chooses to use the gi index. So it is using an index. And the ref column confirms that's referencing all seven columns of that index.

The fact that it must fetch more of the columns to return * means it can't claim "Using [only] index".

Also read this excerpt from https://dev.mysql.com/doc/refman/5.6/en/explain-output.html:

  • Using index

    The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index.


I think of this analogy, to a telephone book:

If you look up a business in a phone book, it's efficient because the book is alphabetized by the name. When you find it, you also have the phone number right there in the same entry. So if that's all you need, it's very quick. That's an index-only query.

If you want extra information about the business, like their hours or credentials or whether they carry a certain product, you have to do the extra step of using that phone number to call them and ask. That's a couple of extra minutes of time to get that information. But you were still able to find the phone number without having to read the entire phone book, so at least it didn't take hours or days. That's a query that used an index, but had to also go look up the row from the table to get other data.

Upvotes: 3

D Stanley
D Stanley

Reputation: 152556

I'm not a MySQL expert, but my guess is that the index was used for the row lookup, but the actual data has to be retrieved from the data pages, so an additional lookup is necessary.

In your first query, the data you ask for is available by looking only at the index keys. When you ask for columns that aren't in the index in the second and third queries, the engine uses the key to do a SEEK on the data tables, so it's still very fast.

With SQL performance, since the optimizer has a lot of freedom to choose the "best" plan, the proof is in the pudding when it comes to indexing. If adding an index makes a common query faster, great, use it. If not, then save the space and overhead of maintaining the index (or look for a better index).

Note that you don't get a free lunch - additional indices can actually slow down a system, particularly if you have frequent inserts or updates on columns that are indexed, since the systme will have to constantly maintain those indices.

Upvotes: 3

Related Questions