Reputation: 25
I have a huge bank in Latin1 and I need to convert ALL the values of ALL the columns of the bank.
Searching discovered a command that does this manually.
UPDATE table SET column = CONVERT (cast (CONVERT (column USING latin1) AS BINARY) USING utf8);
But the bank has a lot of tables and columns, so it's impractical to do the conversion manually, could you help me with some script that would bring that command up to all the columns in the bank?
Upvotes: 1
Views: 5796
Reputation: 11
May case : The table is CHARACTER SET utf8mb4 but some columns had lain1 text (after an upgrade from MySQL 5.7 to 8.0)
Solution that worked for me :
BACKUP YOUR BD FIRST
To preview the result before execution :
SELECT column_name, CONVERT(CAST(CONVERT(column_name USING LATIN1) AS BINARY) USING UTF8MB4) AS converted_column_name FROM table_name
To convert :
UPDATE table_name set column_name = CONVERT(CAST(CONVERT(column_name USING LATIN1) AS BINARY) USING UTF8MB4) WHERE CONVERT(CAST(CONVERT(column_name USING LATIN1) AS BINARY) USING UTF8MB4) IS NOT NULL;
Do that for each column.
Why the WHERE clause...? Because converting a text that has already been converted in UTF8 will set your column_name to NULL
Upvotes: 1
Reputation: 142208
NO -- First let's decide what the full situation is.
What version are you using? If MySQL 5.7, consider going to utf8mb4 so that you can handle Emoji and all of Chinese. If 5.5 or 5.6, that is also possible, but you might run into some problems.
http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
Case 1: The columns are currently CHARACTER SET latin1
and contain only latin1-encoded text. Then do this for each table:
ALTER TABLE t CONVERT TO CHARACTER SET utf8;
Case 2: The columns are currently CHARACTER SET latin1
but you have utf8-encoded characters in them. This leads to Mojibake or the silent "double encoding". The fix needs a pair of alters for each column:
Case 3 (double encoding): Then, and only then, this is called for:
UPDATE tbl SET col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4);
More discussion
CHARACTER SET latin1, but have utf8 bytes in it; leave bytes alone while fixing charset: First, lets assume you have this declaration for tbl.col:
col VARCHAR(111) CHARACTER SET latin1 NOT NULL
Then to convert the column without changing the bytes:
ALTER TABLE tbl MODIFY COLUMN col VARBINARY(111) NOT NULL;
ALTER TABLE tbl MODIFY COLUMN col VARCHAR(111) CHARACTER SET utf8mb4 NOT NULL;
Note: If you start with TEXT, use BLOB as the intermediate definition. (This is the "2-step ALTER, as discussed elsewhere.) (Be sure to keep the other specifications the same - VARCHAR, NOT NULL, etc.)
Which case??
In order to determine which case you have, please provide a small sample of the current data via:
SELECT HEX(col), col FROM t WHERE ...
Example: If the col
has é
, and the HEX is E9
-- that's latin1. If the Hex is C3A9
, it's utf8 improperly stored into latin1. Hex of C383C2A9
would indicate "double encoding".
Generate ALTERs
A tip on how to generate the ALTERs
can be found here . (It is not exactly what you need, but close.)
Upvotes: 5
Reputation: 561
Create a new table with the same exact layout of the latin table but specify utf8 in the create table.
Then
INSERT INTO new_table SELECT *
FROM latin_table
Upvotes: 0