Reputation: 1275
Looking at the below 3 MySQL tables is it typical to have an index length much higher than the actual row count?
Also is there a limit to the length of the index before you start to degrade in performance rapidly, for instance the first table with an index length of 206 million plus?
table_rows data_length index_length Size in MB
7607749 5044389164 206542848 5007.68
3110749 1832710212 793864192 2504.9
4811507 1088374128 318001152 1341.22
Upvotes: 1
Views: 4866
Reputation: 142298
table_rows
is the number of rows in the table. This number is exact for MyISAM, but only approximate for InnoDB. data_length
is the number of bytes in the data portion of the table. For InnoDB, this includes the PRIMARY KEY
. index_length
is the number of bytes (not rows) for the indexes (excluding the PK if InnoDB).
If you have a large number of indexes, index_length can be bigger than data_length. This is a clue that you might have too many indexes, but that is not necessarily "bad".
Each index is stored as an independent BTree. When you add another index, you get another BTree; this does not impact the performance of the existing indexes.
Your tables have a few million rows; this implies that each BTree is about 4 levels deep. If the table grows to a billion rows, its BTrees will grow to about 5 levels. This is minor.
Degradation can occur when things get bigger. But it is not that simple.
Example 1: Your data has a datetime index or an auto_increment PRIMARY KEY and you always look only at "recent" rows. In this case, probably the "working set" is small enough to fit in RAM. You won't notice any performance degradation as the data & indexes grow.
Example 2: Some queries require scanning the entire table or an entire index. That blows out cache, and performance falls off a cliff.
Example 3: Index on a UUID. This is a very random index. The 'next' UUID you insert or select will have no relationship to the others you have touched recently. Hence, you are likely to need to hit the disk once the data / index is too big for RAM. Here, the performance gradually gets worse.
My point is that performance degradation is a combination of data/index size, access patterns, cache sizes and RAM size. Not just the numbers you are looking at.
Upvotes: 10