Reputation: 61
Are clustered index created and stored separately than the actual data in MySQL and if so, then why can we not have more than one clustered index. All we need to do is create another index and store it in memory.
Upvotes: 2
Views: 1456
Reputation: 2302
A clustered index is, at least partially, the way the table is physically sorted and stored, i.e. the row order on disk. That's why you can only have one. But because it reflects the physical layout of the rows, it's potentially more compact and performant than a typical index.
UPDATE: As @RickJames excellently points out below, in InnoDB (MySQL's default engine since 5.5.5), a lookup is typically a two-stage process. One b-tree relates a secondary key to a primary key, a second b-tree relates the primary key to the location of a data record. If retrieving data on primary key, only the second lookup is necessary. In that sense, a b-tree lookup is always necessary.
Additionally, according to the MySQL documentation:
Typically, the clustered index is synonymous with the primary key. 1
And the reason it's considered "clustered" and not just a primary key is because InnoDB attempts to order the data records according to primary key and leaves room for future records to be inserted in the correct location in its data pages 2.
Because of that, not only is a query on a InnoDB primary key one fewer b-tree lookup than a secondary index, but the primary key b-tree can be significantly smaller because of the physically ordering of the data on disk.
It stands to reason even if there were a mechanism to make a secondary index that pointed directly to a data record (like an index MyISAM), it wouldn't perform as well as InnoDB's primary/clustered index.
So, it's fundamentally the (at least partial) physical ordering of data records by primary key which prevents you from getting the same performance from a secondary index.
Upvotes: 2
Reputation: 142298
MySQL's InnoDB does the following for its PRIMARY KEY
: The data records are in PK order, stored together in a B+Tree structure. This allows for rapid point-queries and range scans. That is, the value at the 'bottom' of the tree has all the columns of the table.
InnoDB's secondary keys are also in a B+Tree, but the bottom values are the PK columns. Hence, a second lookup is needed to fetch a row(s) by a secondary key.
Note that a secondary key could contain all the columns of the table, thereby acting like a second clustered index. The drawback is that any modification to the table would necessarily involve changes to both BTrees.
MyISAM, in contrast, throws the data into a file (the .MYD) and has every index in its own BTree in the .MYI file. The bottom of each BTree is a pointer (row number or byte offset) into the .MYD. The PK is not implemented any differently than a secondary key.
(Note: FULLTEXT
and SPATIAL
indexes are not covered by the above discussion.)
Upvotes: 1