Aalex Gabi
Aalex Gabi

Reputation: 1755

COLLATE in SQL statements on fields in utf8_bin slower than using the default collation?

Two scenarios:

Using the default collation:

CREATE TABLE IF NOT EXISTS `table` (
  `name` varchar(255) collate utf8_general_ci NOT NULL,
  UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

SELECT `name` FROM `table` ORDER BY `name`;

Using COLLATE:

CREATE TABLE IF NOT EXISTS `table` (
  `name` varchar(255) collate utf8_bin NOT NULL,
  UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

SELECT `name` FROM `table` ORDER BY `name` COLLATE utf8_general_ci;

I need to change from the first scenario to second because the index is case insensitive. Still ordering is important. There are experimental collations as utf8_general_cs but it requires special compilation.

Will this have an impact on the performance?

In my opinion if MySQL stores text fields internally in utf8 independent of collation it should not affect performance.

Edit: The output of explain in case COLLATE is used is the same as without.

mysql> EXPLAIN SELECT * 
    -> FROM `table`
    -> ORDER BY `name`
    -> COLLATE utf8_general_ci;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | table | ALL  | NULL          | NULL | NULL    | NULL |    5 | Using filesort | 
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * 
    -> FROM `table`
    -> ORDER BY `name`;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | table | ALL  | NULL          | NULL | NULL    | NULL |    5 | Using filesort | 
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)

Upvotes: 3

Views: 7245

Answers (2)

Pomyk
Pomyk

Reputation: 3328

If you don't have an index on that column I don't think it will be slower. With indexed column it would be slower.

With first table I get Extra field = "Using index;", with second table "Using index; Using filesort". So the second would be slower.

Upvotes: 0

hsanders
hsanders

Reputation: 1898

The collation determines the manner in which the column is considered for indexing as well as the manner in which comparisons are made. utf8 bin is meant for comparing strings by binary value, and utf8 general is meant for comparing by alphabetical value. What's considered a match will vary by collation and the order will vary by collation. If the column is treated as a binary (as in UTF8 bin) a character is equal to another character if and only if its bit value is equivalent.

When you specify a different collation in a select statement from the default collation of a field, you can't take advantage of the existing index (which uses the default collation). It should be about the same if you manually specify a collation on a query using an indexed column as it would be on a non-indexed column (provided the manually specified collation is different from the column's default collation) because it would simply ignore the index, in which case MySQL would use QuickSort (using a comparator based on the specified collation).

Upvotes: 4

Related Questions