Roel van Duijnhoven
Roel van Duijnhoven

Reputation: 850

GCP MySQL Cloud SQL database does not properly export multibyte UTF-8 characters

When I create a backup of my MySQL Cloud SQL database, it does not export correctly the UTF-8 multibyte characters that it contains.

I feel this is a bug. But it could be that I am missing something. I hope somebody can help me out!

Steps to reproduce:

  1. Create a new Cloud SQL MySQL database. Using version 5.7. Be sure to set the flag character_set_server to utf8mb4.

  2. Create a new database with character_set to utf8mb4 and collation to utf8mb4_unicode_ci.

  3. Populate the database:

    SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_ci';
    CREATE TABLE `slug` (
     `slug` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
    INSERT INTO demo.slug VALUES ('🙊🇩🇪');
    
  4. I validated that I can get those values back properly using the following:

    SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_ci';
    SELECT * FROM demo.slug;
    
  5. Use the build-in export mechanism the Console provides.

    enter image description here

  6. When you download the resulting SQL file, the relevant INSERT statement will look as follows. And contain question marks, instead of valid UTF-8 characters.

    INSERT INTO `slug` VALUES ('???');
    

    Note that I also get that exact value when I omit the SET NAMES part in step 4.

Upvotes: 2

Views: 1053

Answers (3)

Roel van Duijnhoven
Roel van Duijnhoven

Reputation: 850

I got a reply from Google: Engineering is aware of this. No ETA on a change, however. So there is no way to fix this. In the meantime the only way to make a valid backup is using iker's instructions.

Upvotes: 0

RedPandaCurios
RedPandaCurios

Reputation: 2324

Edit: Ignore the below -- The success may have bee because the bytes making up the characters I tested were also valid latin-1 characters...

It seems that this only affects 4-byte UTF-8 encoded characters (which includes all the emoticons, etc.)

Trying with a 3-byte character such as Cyrillic Zemlya (Ꙁ utf8=ea9980) or a 2-byte character such as Greek Epsilon (Ɛ utf-8=ceb5) succeeds.

Note that Cloud SQL backups do not use exports, so they are not affected.

Upvotes: 0

iker lasaga
iker lasaga

Reputation: 394

I tried doing it through the Console as you did, and I get to the same issue. The workaround that worked for me is this one:

1) Whitelist you GCP project's public IP in

STORAGE > SQL > your instance > CONNECTIONS > Connectivity > +Add network

By running this command you can get the public IP dig +short myip.opendns.com @resolver1.opendns.com

2) Create a .sql file by running this command: mysqldump --databases [YOUR_DB_NAME] -h [YOUR_INSTANCE_PUBLIC_IP] -u [YOUR_USER] -p --default-character-set=utf8mb4 > [FILE_NAME].sql

3) Move the file into a bucket by running this command: gsutil mv [FILE_NAME].sql gs://[YOUR_BUCKET_NAME]/

4) Then you import the file via Console, selecting the database you want.

I have tried it and it works for me.

Upvotes: 2

Related Questions