NoActualName
NoActualName

Reputation: 71

MySQL clustered indexes

I need to create multi-column index in InnoDB and MyISAM. The task is to compare InnoDB cluster index with MyISAM non cluster index. I know, that id's in InnoDB are cluster index. According to MySQL documentation is it true that every multi-column index such:

create index Order_date_telno on orders (telno, date);

Implicitly stores cluster index in itself like:

create index Order_date_telno on orders (telno, date, id);

Documentation: All indexes other than the clustered index are known as secondary indexes. In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.

If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.

And how to see the difference between non cluster compsoite index and cluster composite index in MyISAM and InnoDB if the query doesn't contain id in itself? (even if it's, explain query doesn't show the influence of cluster index on query)

Upvotes: 0

Views: 445

Answers (1)

Rick James
Rick James

Reputation: 142208

The task is not as simple as you might think. There are many variants. Caching impacts InnoDB and MyISAM differently. Preliminary results are likely to mislead you. The engines lay out the data differently. A table that is bigger than RAM will have different performance characteristics. 'Churn' (lots of inserts/updates/deleted) may rearrange the data, leading to performance changes over time. Multi-column ("composite") indexes versus single-column indexes: possibly no difference between MyISAM and InnoDB. "Covering" is tricky.

Terminology: Please refer to it as a "clustered" index; leave "cluster" to refer to multiple servers sharing the load.

The one paragraph you grabbed from the Documentation is key to the main between the two engines' indexing.

MyISAM caching -- 1KB index blocks are cached in the key_buffer. Disk blocks are cached by the OS.

InnoDB caching -- Everything is in 16KB blocks cached in the buffer_pool.

All indexes (both engines) are "BTree", but InnoDB uses B+Tree, thereby slightly improving range scans.

InnoDB, as you say, clusters the Primary Key with the data. This says that a PK lookup can rapidly get to the entire row. A secondary key lookup, however, has to go through 2 BTree lookups -- one in the secondary key's BTree, then a second in the PK's BTree.

All indexes (excluding FULLTEXT and SPATIAL) are BTrees.

In MyISAM, first, the index is used, then the data is accessed via a direct pointer (either row number of byte offset) into a .MYD file. So, it is likely to be between the two InnoDB cases in performance for a secondary key lookup.

A "Covering" index is one that contains all the columns needed for a SELECT. That gives InnoDB an edge since the second lookup is avoided. In your example, INDEX(telno, date) for SELECT id FROM t WHERE telno=2 AND date=.. is not "covering" for MyISAM, but is covering for InnoDB (due to the implicit addition of id, thereby flipping the performance advantage to the other engine. Until you understand these details, your timings are likely to confound you.

Caching... As a rule of thumb, an InnoDB table occupies 2-3 times as much disk space as the equivalent MyISAM table. This leads to more I/O. In reality, other optimizations prevent it from leading to any easily measured speed difference. As a table grows past what can be cached, performance 'falls off a cliff'. This cliff happens sooner with InnoDB. (However, in today's computers, that does not happen until a table is many GB in size, so any casual performance test will miss it.)

Big rows and churn...

MyISAM handles DELETEs by marking the space in the .MYD as empty. Subsequent INSERTs will use that space in preference to 'extending' the file. For variable length rows, this can lead to a single row broken into chunks and scattered around the file. (Normally, MyISAM rows are intact; that is, the bytes are consecutive.) (This example leads to one of the very few cases for OPTIMIZE TABLE.)

InnoDB breaks up big rows into "on-record" and "off-record". (Cf "Row_format") Up to about 8KB of data is stored on-record in the BTree blocks mentioned above. Big columns are spilled to other blocks. So, there can be a big performance difference between SELECT a FROM ... and SELECT a, big_text_column FROM ... just because of how that big column is stored. (This is one reason to a void SELECT *.) MyISAM does not have that slowdown.

InnoDB does a reasonably good job of maintaining its BTrees. (Hence, OPTIMIZE is even less needed.)

To avoid having to explain these gory details, but to help with indexing, I have written a cookbook: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Upvotes: 1

Related Questions