Kjell Rilbe
Kjell Rilbe

Reputation: 1509

mariadb-dump with database name containing unicode characters?

Trying to author a mariadb-dump.exe command line to back up our MariaDB 10.11 databases on Windows (Swedish locale, win-1252 system codepage). Unfortunately we expect full unicode support and named a database so it contains Swedish character "ö".

I tried this command line:

mariadb-dump.exe --user=root --password=XXX --opt --result-file=backup.sql --default-character-set=utf8mb4 --quote-names dbföretag FirstTable SecondTable

I get this error:

mariadb-dump.exe: Error: 'Illegal mix of collations (utf8mb3_general_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '='' when trying to dump tablespaces
mariadb-dump.exe: Got error: 1300: "Invalid utf8mb4 character string: 'dbf\xF6retag'" when selecting the database

I am trying to resolve the second error, which seems to indicate that mariadb-dump.exe fails to correctly encode the database name when sending it to the server, or the server incorrectly interprets the string when received.

I tried chcp 65001 in the cmd.exe session before running mariadb-dump.exe, but I get the exact same result.

The character "ö" has unicode codepoint U+00F6, which matches \xF6 in the error message, but UTF-8 encodes it as 0xC3 0xB6. Since I get this same result regardless of which chcp I use in the cmd.exe session, I conclude that mariadb-dump.exe correctly interprets the command line and understands that the "ö" is unicode codepoint U+00F6.

But it seems to fail to convert it into the encoding that should be sent to the server. Instead of encoding U+00F6 into utf-8 \xC3\xB6 it passes the unicode codepoint without conversion, as \xF6. I fail to see how that could work regardless of encoding. Is there ANY unicode encoding that uses 1 byte per character up to and including code point U+00F6?

As a work around I am able to artificially create the correct utf-8 string by passing on the commnad line the two characters with unicode code points U+00C3 and U+00B6, i.e. using characters ö. Since it's the unicode code points of those characters that matter and not how they are encoded in the cmd.exe session's code page, these two characters give the correct result regardless of which code page is being used in the cmd.exe session.

So, this command line works:

mariadb-dump.exe --user=root --password=XXX --opt --result-file=backup.sql --default-character-set=utf8mb4 --quote-names dbföretag FirstTable SecondTable

Is there any way I can get mariadb-dump.exe to encode the database name into utf-8 correctly?

I tried adding these lines to my.ini, but it doesn't help:

[client]
character_set_connection=utf8mb4
collation_connection=utf8mb4_bin

Is this a bug in mariadb-dump.exe? In the server? In the MariaDB client library being used by mariadb-dump.exe? Or what?

UPDATE: Bug reported to MariaDB: https://jira.mariadb.org/browse/MDEV-32264

UPDATE: As can be read in the answers to the bug report above, the current implementation is said to work in newer Windows versions, and the one we have (Server 2019) will be out of mainstream support soon and it would be a significant effort to fix it so it works with that old Windows version. So, they won't fix anything. Instead, we will plan an upgrade to Server 2022 and hope that the problem will go away. In the meantime, using a work around.

Upvotes: 0

Views: 156

Answers (0)

Related Questions