fifonik
fifonik

Reputation: 1606

Indexed VARCHAR's key_len

-- 2 Moderators: please do not re-format my code

DROP TABLE IF EXISTS `t`;

CREATE TABLE `t` (
      `v` VARCHAR(3) NOT NULL COLLATE 'latin1_general_ci'
    , `c` CHAR(3)    NOT NULL COLLATE 'latin1_general_ci'
    , INDEX `IX_t_v` (`v`)
    , INDEX `IX_t_c` (`c`)
)
;


INSERT INTO t
      (v, c)
VALUES
      ('001', '001')
    , ('002', '002')
    , ('003', '003')
    , ('004', '004')
;

EXPLAIN SELECT c FROM t WHERE c = '001';

EXPLAIN SELECT v FROM t WHERE v = '001';

The explain commands give me:

id  select_type table   partitions  type    possible_keys   key key_len ref rows    filtered    Extra
1   SIMPLE  t       ref IX_t_c  IX_t_c  3   const   1   100.00  Using index
id  select_type table   partitions  type    possible_keys   key key_len ref rows    filtered    Extra
1   SIMPLE  t       ref IX_t_v  IX_t_v  5   const   1   100.00  Using index

The question is: why key_len for the second query is 5?

Based on CHAR and VARCHAR Types, storage requirements for the VARCHAR(3) NOT NULL should be 4 so I expected the key_len would be the same!

What am I missing here?

Thanks.

P.S. MySQL 5.7 and MariaDB 10.1 gives the same results for me.

Upvotes: 2

Views: 644

Answers (2)

Rick James
Rick James

Reputation: 142518

key_len is, to some extent, a fabrication of EXPLAIN. The Engine may store VAR columns in some other way. Ditto for, multi-byte character sets. And there is the indication of NULL, which seems to be done in two or three different ways.

I see key_len useful for only one thing: How many columns of a composite index is it using. (With the advent of EXPLAIN FORMAT=JSON, this information is readily available.)

EXPLAIN's key_len value

  • n*m bytes, where n is the limit given (varchar(n)) and m is the potential number of bytes per character for the given character set (1 for latin1, 3 for utf8, 4 for utf8mb4)
  • plus 2 if VAR (varchar/varbinary)
  • plus 1 for NULL (even though NULLness might be stored using a single bit for the Engine and ROW_FORMAT in use.)

Related length issues:

  • utf8, for example, does not always use 3 bytes per character, so the length can be less than n*m given above.
  • VAR fields take only the length needed, plus the length indicator.
  • However, when a temp table is implemented via MEMORY, VARCHAR turns into a full-length CHAR. (This statement goes away with MySQL 8.0.) So, for that one situation, EXPLAIN's key_len may actually reflect the space used.

What do you use key_len for?

Upvotes: 5

Bill Karwin
Bill Karwin

Reputation: 562811

https://github.com/mysql/mysql-server/blob/8.0/sql/field.cc#L6790 says:

/**
  @note
    varstring and blob keys are ALWAYS stored with a 2 byte length prefix
*/

Upvotes: 5

Related Questions