Reputation: 13763
Currently I use:
utf8mb4
database character set.utf8mb4_unicode_520_ci
database collation.I understand that utf8mb4
supports up to four bytes per character. I also understand that Unicode is a standard that continues to get updates. In the past I thought utf8
was sufficient until I had some test data get corrupted, lesson learned. However I'm having difficulty understanding the upgrade path for both the character set and collations.
The utf8mb4_unicode_520_ci
database collation is based off of Unicode Collation Algorithm version 5.2.0. If you navigate to the parent directory you'll see up to version 14.0 listed at the time of typing this. Now those are the Unicode standards, then there is the supported MariaDB character sets and collations.
Offhand I'm not sure when the need to go from four bytes per character gets superseded to go to eight bytes per character or even 16 so it's not as simple a measure of just updating the database collation. Additionally I'm not seeing anything that seems to be newer than version 5.2.0 on MariaDB's documentation.
So in short my three highly related questions are:
utf8mb4
still sufficient for a character set or not?I am not bound to or care about MySQL compatibility.
Upvotes: 3
Views: 2915
Reputation: 14771
Skipping the old 5.2.0 versions, MariaDB added UCA-14.0.0 collations in MariaDB-10.10.2 and are also available in MariaDB-10.11+ versions.
The 14.0.0 collations also include accent insensitivity as an optional collation attribute.
Contractions are also supported in this version.
The list is:
MariaDB [(none)]> SELECT * FROM INFORMATION_SCHEMA.COLLATIONS where collation_name like 'uca1400_%';
+--------------------------------+--------------------+------+------------+-------------+---------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+--------------------------------+--------------------+------+------------+-------------+---------+
| uca1400_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_icelandic_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_icelandic_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_icelandic_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_icelandic_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_icelandic_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_icelandic_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_icelandic_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_icelandic_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_latvian_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_latvian_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_latvian_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_latvian_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_latvian_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_latvian_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_latvian_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_latvian_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_romanian_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_romanian_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_romanian_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_romanian_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_romanian_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_romanian_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_romanian_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_romanian_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovenian_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovenian_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovenian_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovenian_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovenian_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovenian_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovenian_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovenian_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_polish_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_polish_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_polish_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_polish_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_polish_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_polish_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_polish_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_polish_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_estonian_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_estonian_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_estonian_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_estonian_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_estonian_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_estonian_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_estonian_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_estonian_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_swedish_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_swedish_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_swedish_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_swedish_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_swedish_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_swedish_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_swedish_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_swedish_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_turkish_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_turkish_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_turkish_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_turkish_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_turkish_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_turkish_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_turkish_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_turkish_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_czech_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_czech_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_czech_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_czech_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_czech_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_czech_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_czech_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_czech_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_danish_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_danish_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_danish_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_danish_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_danish_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_danish_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_danish_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_danish_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_lithuanian_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_lithuanian_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_lithuanian_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_lithuanian_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_lithuanian_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_lithuanian_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_lithuanian_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_lithuanian_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovak_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovak_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovak_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovak_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovak_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovak_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovak_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_slovak_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish2_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish2_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish2_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish2_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish2_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish2_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish2_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_spanish2_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_roman_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_roman_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_roman_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_roman_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_roman_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_roman_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_roman_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_roman_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_persian_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_persian_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_persian_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_persian_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_persian_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_persian_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_persian_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_persian_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_esperanto_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_esperanto_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_esperanto_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_esperanto_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_esperanto_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_esperanto_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_esperanto_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_esperanto_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_hungarian_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_hungarian_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_hungarian_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_hungarian_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_hungarian_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_hungarian_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_hungarian_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_hungarian_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_sinhala_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_sinhala_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_sinhala_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_sinhala_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_sinhala_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_sinhala_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_sinhala_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_sinhala_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_german2_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_german2_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_german2_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_german2_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_german2_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_german2_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_german2_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_german2_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_vietnamese_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_vietnamese_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_vietnamese_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_vietnamese_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_vietnamese_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_vietnamese_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_vietnamese_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_vietnamese_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_croatian_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_croatian_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_croatian_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_croatian_as_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_croatian_nopad_ai_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_croatian_nopad_ai_cs | NULL | NULL | NULL | Yes | 8 |
| uca1400_croatian_nopad_as_ci | NULL | NULL | NULL | Yes | 8 |
| uca1400_croatian_nopad_as_cs | NULL | NULL | NULL | Yes | 8 |
+--------------------------------+--------------------+------+------------+-------------+---------+
184 rows in set (0.001 sec)
ref: MDEV-27009
Upvotes: 3
Reputation: 562981
You can inspect the collations currently supported by your MariaDB instance:
SELECT * FROM INFORMATION_SCHEMA.COLLATIONS
WHERE CHARACTER_SET_NAME = 'utf8mb4';
As far as I know, MariaDB does not support any UTF-8 collation version more current than utf8_unicode_520ci. If you try to use the '900' version, for example importing metadata from MySQL to MariaDB, you get errors.
There is no such thing as an 8-byte or 16-byte UTF-8 encoding. UTF-8 is an encoding that uses between 1 and 4 bytes per character, no more than that.
MariaDB also supports utf16 and utf32, but neither of these uses more than 4 bytes per character. Utf16 is variable-length, using one or two 16-bit code units per character. Utf32 is fixed-width, always using 32-bits (4 bytes) per character.
Upvotes: 1