Pierre
Pierre

Reputation: 1256

Fix database encoded through PDO without specifying UTF8 charset

Although I set up my web app to be fully compliant with UTF8, I forgot to specify the UTF-8 charset in the PDO settings... It happens that the Chinese characters looks strange in the database/phpmyadmin (你好 for 你好), although they look fine in my web app.

As I finally know the solution (to add the charset in the PDO connection settings), I would like to know if there is an easy way to convert my old data to a proper UTF-8. I found on Stackoverflow the following code that does the job, but it solves the encoding problem column per column, table per table..

UPDATE tablename SET
field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8);

Does it exist a way to perform such conversion to the whole database?

I precise that my MySQL databases/tables are already configured to use the charset UTF-8.

Thanks!

Upvotes: 0

Views: 895

Answers (1)

Rick James
Rick James

Reputation: 142518

You have "double encoding". I believe these steps 'caused' it:

  1. The bytes (for, say, 你好) to be stored were correctly encoded utf8 (E4BDA0 E5A5BD).
  2. The CHARACTER SET for the column was correctly utf8 (or utf8mb4). (It could have been latin1, with same outcome.)
  3. The "connection" was set to latin1 when you INSERTed. This is where the error was. That declared that the data being stored was encoded as latin1, these 6 characters: hex E4 BD A0 E5 A5 BD. During the storing, they were converted to the columns charset (utf8) to become C3A4 C2BD C2A0 C3A5 C2A5 C2BD.
  4. You correctly SELECTed the data as utf8. But, by now, that gave you this mess: 你好.

Two things are wrong; both need fixing simultaneously:

  • Step 2 needs changing to establish that UTF-8 is the client's encoding. See PHP. Do something like $db = new PDO('dblib:host=host;dbname=db;charset=UTF8', $user, $pwd); for PDO.
  • The data needs fixing. Your UPDATE looks good for this case.

Upvotes: 1

Related Questions