Kelvera
Kelvera

Reputation: 25

Convert charset Latin1 to Utf-8 in ALL columns

I have a huge bank in Latin1 and I need to convert ALL the values of ALL the columns of the bank.

Searching discovered a command that does this manually.

UPDATE table SET column = CONVERT (cast (CONVERT (column USING latin1) AS BINARY) USING utf8);

But the bank has a lot of tables and columns, so it's impractical to do the conversion manually, could you help me with some script that would bring that command up to all the columns in the bank?

Upvotes: 1

Views: 5796

Answers (3)

Philippe - Agence Lex
Philippe - Agence Lex

Reputation: 11

May case : The table is CHARACTER SET utf8mb4 but some columns had lain1 text (after an upgrade from MySQL 5.7 to 8.0)

Solution that worked for me :

BACKUP YOUR BD FIRST

To preview the result before execution :

SELECT column_name, CONVERT(CAST(CONVERT(column_name USING LATIN1) AS BINARY) USING UTF8MB4) AS converted_column_name FROM table_name

To convert :

UPDATE table_name set column_name = CONVERT(CAST(CONVERT(column_name USING LATIN1) AS BINARY) USING UTF8MB4) WHERE CONVERT(CAST(CONVERT(column_name USING LATIN1) AS BINARY) USING UTF8MB4) IS NOT NULL;

Do that for each column.


Why the WHERE clause...? Because converting a text that has already been converted in UTF8 will set your column_name to NULL

Upvotes: 1

Rick James
Rick James

Reputation: 142208

NO -- First let's decide what the full situation is.

What version are you using? If MySQL 5.7, consider going to utf8mb4 so that you can handle Emoji and all of Chinese. If 5.5 or 5.6, that is also possible, but you might run into some problems.

http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases

Case 1: The columns are currently CHARACTER SET latin1 and contain only latin1-encoded text. Then do this for each table:

ALTER TABLE t CONVERT TO CHARACTER SET utf8;

Case 2: The columns are currently CHARACTER SET latin1 but you have utf8-encoded characters in them. This leads to Mojibake or the silent "double encoding". The fix needs a pair of alters for each column:

Case 3 (double encoding): Then, and only then, this is called for:

UPDATE tbl SET col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4);

More discussion

CHARACTER SET latin1, but have utf8 bytes in it; leave bytes alone while fixing charset: First, lets assume you have this declaration for tbl.col:

col VARCHAR(111) CHARACTER SET latin1 NOT NULL

Then to convert the column without changing the bytes:

ALTER TABLE tbl MODIFY COLUMN col VARBINARY(111) NOT NULL;
ALTER TABLE tbl MODIFY COLUMN col VARCHAR(111) CHARACTER SET utf8mb4 NOT NULL;

Note: If you start with TEXT, use BLOB as the intermediate definition. (This is the "2-step ALTER, as discussed elsewhere.) (Be sure to keep the other specifications the same - VARCHAR, NOT NULL, etc.)

Which case??

In order to determine which case you have, please provide a small sample of the current data via:

 SELECT HEX(col), col FROM t WHERE ...

Example: If the col has é, and the HEX is E9 -- that's latin1. If the Hex is C3A9, it's utf8 improperly stored into latin1. Hex of C383C2A9 would indicate "double encoding".

Generate ALTERs

A tip on how to generate the ALTERs can be found here . (It is not exactly what you need, but close.)

Upvotes: 5

Bleach
Bleach

Reputation: 561

Create a new table with the same exact layout of the latin table but specify utf8 in the create table.

Then

INSERT INTO new_table SELECT * 
FROM latin_table

Upvotes: 0

Related Questions