John
John

Reputation: 13763

Upgrading MariaDB collation from utf8mb4_unicode_520_ci

Currently I use:

I understand that utf8mb4 supports up to four bytes per character. I also understand that Unicode is a standard that continues to get updates. In the past I thought utf8 was sufficient until I had some test data get corrupted, lesson learned. However I'm having difficulty understanding the upgrade path for both the character set and collations.

The utf8mb4_unicode_520_ci database collation is based off of Unicode Collation Algorithm version 5.2.0. If you navigate to the parent directory you'll see up to version 14.0 listed at the time of typing this. Now those are the Unicode standards, then there is the supported MariaDB character sets and collations.

Offhand I'm not sure when the need to go from four bytes per character gets superseded to go to eight bytes per character or even 16 so it's not as simple a measure of just updating the database collation. Additionally I'm not seeing anything that seems to be newer than version 5.2.0 on MariaDB's documentation.

So in short my three highly related questions are:

I am not bound to or care about MySQL compatibility.

Upvotes: 3

Views: 2915

Answers (2)

danblack
danblack

Reputation: 14771

Skipping the old 5.2.0 versions, MariaDB added UCA-14.0.0 collations in MariaDB-10.10.2 and are also available in MariaDB-10.11+ versions.

The 14.0.0 collations also include accent insensitivity as an optional collation attribute.

Contractions are also supported in this version.

The list is:

MariaDB [(none)]> SELECT * FROM INFORMATION_SCHEMA.COLLATIONS where collation_name like 'uca1400_%';
+--------------------------------+--------------------+------+------------+-------------+---------+
| COLLATION_NAME                 | CHARACTER_SET_NAME | ID   | IS_DEFAULT | IS_COMPILED | SORTLEN |
+--------------------------------+--------------------+------+------------+-------------+---------+
| uca1400_ai_ci                  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_ai_cs                  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_as_ci                  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_as_cs                  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_nopad_ai_ci            | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_nopad_ai_cs            | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_nopad_as_ci            | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_nopad_as_cs            | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_icelandic_ai_ci        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_icelandic_ai_cs        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_icelandic_as_ci        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_icelandic_as_cs        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_icelandic_nopad_ai_ci  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_icelandic_nopad_ai_cs  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_icelandic_nopad_as_ci  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_icelandic_nopad_as_cs  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_latvian_ai_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_latvian_ai_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_latvian_as_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_latvian_as_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_latvian_nopad_ai_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_latvian_nopad_ai_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_latvian_nopad_as_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_latvian_nopad_as_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_romanian_ai_ci         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_romanian_ai_cs         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_romanian_as_ci         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_romanian_as_cs         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_romanian_nopad_ai_ci   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_romanian_nopad_ai_cs   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_romanian_nopad_as_ci   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_romanian_nopad_as_cs   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovenian_ai_ci        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovenian_ai_cs        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovenian_as_ci        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovenian_as_cs        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovenian_nopad_ai_ci  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovenian_nopad_ai_cs  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovenian_nopad_as_ci  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovenian_nopad_as_cs  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_polish_ai_ci           | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_polish_ai_cs           | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_polish_as_ci           | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_polish_as_cs           | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_polish_nopad_ai_ci     | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_polish_nopad_ai_cs     | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_polish_nopad_as_ci     | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_polish_nopad_as_cs     | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_estonian_ai_ci         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_estonian_ai_cs         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_estonian_as_ci         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_estonian_as_cs         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_estonian_nopad_ai_ci   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_estonian_nopad_ai_cs   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_estonian_nopad_as_ci   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_estonian_nopad_as_cs   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish_ai_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish_ai_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish_as_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish_as_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish_nopad_ai_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish_nopad_ai_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish_nopad_as_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish_nopad_as_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_swedish_ai_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_swedish_ai_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_swedish_as_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_swedish_as_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_swedish_nopad_ai_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_swedish_nopad_ai_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_swedish_nopad_as_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_swedish_nopad_as_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_turkish_ai_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_turkish_ai_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_turkish_as_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_turkish_as_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_turkish_nopad_ai_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_turkish_nopad_ai_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_turkish_nopad_as_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_turkish_nopad_as_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_czech_ai_ci            | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_czech_ai_cs            | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_czech_as_ci            | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_czech_as_cs            | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_czech_nopad_ai_ci      | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_czech_nopad_ai_cs      | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_czech_nopad_as_ci      | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_czech_nopad_as_cs      | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_danish_ai_ci           | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_danish_ai_cs           | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_danish_as_ci           | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_danish_as_cs           | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_danish_nopad_ai_ci     | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_danish_nopad_ai_cs     | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_danish_nopad_as_ci     | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_danish_nopad_as_cs     | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_lithuanian_ai_ci       | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_lithuanian_ai_cs       | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_lithuanian_as_ci       | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_lithuanian_as_cs       | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_lithuanian_nopad_ai_ci | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_lithuanian_nopad_ai_cs | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_lithuanian_nopad_as_ci | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_lithuanian_nopad_as_cs | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovak_ai_ci           | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovak_ai_cs           | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovak_as_ci           | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovak_as_cs           | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovak_nopad_ai_ci     | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovak_nopad_ai_cs     | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovak_nopad_as_ci     | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_slovak_nopad_as_cs     | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish2_ai_ci         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish2_ai_cs         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish2_as_ci         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish2_as_cs         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish2_nopad_ai_ci   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish2_nopad_ai_cs   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish2_nopad_as_ci   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_spanish2_nopad_as_cs   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_roman_ai_ci            | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_roman_ai_cs            | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_roman_as_ci            | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_roman_as_cs            | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_roman_nopad_ai_ci      | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_roman_nopad_ai_cs      | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_roman_nopad_as_ci      | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_roman_nopad_as_cs      | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_persian_ai_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_persian_ai_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_persian_as_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_persian_as_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_persian_nopad_ai_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_persian_nopad_ai_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_persian_nopad_as_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_persian_nopad_as_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_esperanto_ai_ci        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_esperanto_ai_cs        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_esperanto_as_ci        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_esperanto_as_cs        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_esperanto_nopad_ai_ci  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_esperanto_nopad_ai_cs  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_esperanto_nopad_as_ci  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_esperanto_nopad_as_cs  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_hungarian_ai_ci        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_hungarian_ai_cs        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_hungarian_as_ci        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_hungarian_as_cs        | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_hungarian_nopad_ai_ci  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_hungarian_nopad_ai_cs  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_hungarian_nopad_as_ci  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_hungarian_nopad_as_cs  | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_sinhala_ai_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_sinhala_ai_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_sinhala_as_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_sinhala_as_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_sinhala_nopad_ai_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_sinhala_nopad_ai_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_sinhala_nopad_as_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_sinhala_nopad_as_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_german2_ai_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_german2_ai_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_german2_as_ci          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_german2_as_cs          | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_german2_nopad_ai_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_german2_nopad_ai_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_german2_nopad_as_ci    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_german2_nopad_as_cs    | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_vietnamese_ai_ci       | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_vietnamese_ai_cs       | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_vietnamese_as_ci       | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_vietnamese_as_cs       | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_vietnamese_nopad_ai_ci | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_vietnamese_nopad_ai_cs | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_vietnamese_nopad_as_ci | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_vietnamese_nopad_as_cs | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_croatian_ai_ci         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_croatian_ai_cs         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_croatian_as_ci         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_croatian_as_cs         | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_croatian_nopad_ai_ci   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_croatian_nopad_ai_cs   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_croatian_nopad_as_ci   | NULL               | NULL | NULL       | Yes         |       8 |
| uca1400_croatian_nopad_as_cs   | NULL               | NULL | NULL       | Yes         |       8 |
+--------------------------------+--------------------+------+------------+-------------+---------+
184 rows in set (0.001 sec)

ref: MDEV-27009

Upvotes: 3

Bill Karwin
Bill Karwin

Reputation: 562981

You can inspect the collations currently supported by your MariaDB instance:

SELECT * FROM INFORMATION_SCHEMA.COLLATIONS
WHERE CHARACTER_SET_NAME = 'utf8mb4';

As far as I know, MariaDB does not support any UTF-8 collation version more current than utf8_unicode_520ci. If you try to use the '900' version, for example importing metadata from MySQL to MariaDB, you get errors.

There is no such thing as an 8-byte or 16-byte UTF-8 encoding. UTF-8 is an encoding that uses between 1 and 4 bytes per character, no more than that.

MariaDB also supports utf16 and utf32, but neither of these uses more than 4 bytes per character. Utf16 is variable-length, using one or two 16-bit code units per character. Utf32 is fixed-width, always using 32-bits (4 bytes) per character.

Upvotes: 1

Related Questions