Reputation: 4187
I've got a database with a bunch of broken utf8 characters scattered across several tables. The list of characters isn't very extensive AFAIK (áéíúóÁÉÍÓÚÑñ)
Fixing a given table is very straightforward
update orderItem set itemName=replace(itemName,'á','á');
But I can't get a way of detecting the broken characters. If I do something like
SELECT * FROM TABLE WHERE field LIKE "%Ã%";
I get nearly all the fields because of the collation (Ã=a). All broken characters so far start with an "Ã". The database is in spanish so this particular character isn't used
The list of broken chars I've got so far is
á = á
é = é
Ã- = í
ó = ó
ñ = ñ
á = Á
Any idea of how to make this SELECT to work as intended? (a binary search or something like that)
Upvotes: 51
Views: 98452
Reputation: 3245
To convert all latin characters to correct accents try this on MySQL :
UPDATE your_table SET your_column = CONVERT(CAST(CONVERT(your_column USING latin1) AS BINARY) USING utf8)
Upvotes: 7
Reputation: 6055
This query helped me identify rows that had bad characters in them. Basically you identify where the field isn't null, then convert to UTF8 and check for null after conversion.
select ach.*
from ach_warehouse ach
where addendum is not null and convert(addendum using utf8) is null;
Upvotes: 0
Reputation: 241
Thanks for your answers!!
I fixed my tables with this, and wanted to share the full list of changes. Note that it also includes fixing html decoded characters, besides latin ones, it was really a mess:
(If you need more conversions, look them up at https://www.utf8-chartable.de/unicode-utf8-table.pl)
update `table` set `field` = replace(`field`, 'â€', '"');
update `table` set `field` = replace(`field`, '–', '–');
update `table` set `field` = replace(`field`, '•', '-');
update `table` set `field` = replace(`field`, '“', '"');
update `table` set `field` = replace(`field`, '¡', '¡');
update `table` set `field` = replace(`field`, '¢', '¢');
update `table` set `field` = replace(`field`, '£', '£');
update `table` set `field` = replace(`field`, '¤', '¤');
update `table` set `field` = replace(`field`, 'Â¥', '¥');
update `table` set `field` = replace(`field`, '¦', '¦');
update `table` set `field` = replace(`field`, '§', '§');
update `table` set `field` = replace(`field`, '¨', '¨');
update `table` set `field` = replace(`field`, '©', '©');
update `table` set `field` = replace(`field`, 'ª', 'ª');
update `table` set `field` = replace(`field`, '«', '«');
update `table` set `field` = replace(`field`, '¬', '¬');
# This one looks like it's missing a character, but it's there. 0xad
update `table` set `field` = replace(`field`, 'Â', '');
update `table` set `field` = replace(`field`, '®', '®');
update `table` set `field` = replace(`field`, '¯', '¯');
update `table` set `field` = replace(`field`, '°', '°');
update `table` set `field` = replace(`field`, '±', '±');
update `table` set `field` = replace(`field`, '²', '²');
update `table` set `field` = replace(`field`, '³', '³');
update `table` set `field` = replace(`field`, '´', '´');
update `table` set `field` = replace(`field`, 'µ', 'µ');
update `table` set `field` = replace(`field`, '¶', '¶');
update `table` set `field` = replace(`field`, '·', '·');
update `table` set `field` = replace(`field`, '¸', '¸');
update `table` set `field` = replace(`field`, '¹', '¹');
update `table` set `field` = replace(`field`, 'º', 'º');
update `table` set `field` = replace(`field`, '»', '»');
update `table` set `field` = replace(`field`, '¼', '¼');
update `table` set `field` = replace(`field`, '½', '½');
update `table` set `field` = replace(`field`, '¾', '¾');
update `table` set `field` = replace(`field`, '¿', '¿');
update `table` set `field` = replace(`field`, 'À', 'À');
# This one looks like it's missing a character, but it's there. 0x81
update `table` set `field` = replace(`field`, 'Ã', 'Á');
update `table` set `field` = replace(`field`, 'Â', 'Â');
update `table` set `field` = replace(`field`, 'Ã', 'Ã');
update `table` set `field` = replace(`field`, 'Ä', 'Ä');
update `table` set `field` = replace(`field`, 'Ã…', 'Å');
update `table` set `field` = replace(`field`, 'Æ', 'Æ');
update `table` set `field` = replace(`field`, 'Ç', 'Ç');
update `table` set `field` = replace(`field`, 'È', 'È');
update `table` set `field` = replace(`field`, 'É', 'É');
update `table` set `field` = replace(`field`, 'Ê', 'Ê');
update `table` set `field` = replace(`field`, 'Ë', 'Ë');
update `table` set `field` = replace(`field`, 'ÃŒ', 'Ì');
# This one looks like it's missing a character, but it's there. 0x8d
update `table` set `field` = replace(`field`, 'Ã', 'Í');
update `table` set `field` = replace(`field`, 'ÃŽ', 'Î');
# This one looks like it's missing a character, but it's there. 0x8f
update `table` set `field` = replace(`field`, 'Ã', 'Ï');
# This one looks like it's missing a character, but it's there. 0x90
update `table` set `field` = replace(`field`, 'Ã', 'Ð');
update `table` set `field` = replace(`field`, 'Ñ', 'Ñ');
update `table` set `field` = replace(`field`, 'Ã’', 'Ò');
update `table` set `field` = replace(`field`, 'Ó', 'Ó');
update `table` set `field` = replace(`field`, 'Ô', 'Ô');
update `table` set `field` = replace(`field`, 'Õ', 'Õ');
update `table` set `field` = replace(`field`, 'Ö', 'Ö');
update `table` set `field` = replace(`field`, '×', '×');
update `table` set `field` = replace(`field`, 'Ø', 'Ø');
update `table` set `field` = replace(`field`, 'Ù', 'Ù');
update `table` set `field` = replace(`field`, 'Ú', 'Ú');
update `table` set `field` = replace(`field`, 'Û', 'Û');
update `table` set `field` = replace(`field`, 'Ãœ', 'Ü');
# This one looks like it's missing a character, but it's there. 0x9d
update `table` set `field` = replace(`field`, 'Ã', 'Ý');
update `table` set `field` = replace(`field`, 'Þ', 'Þ');
update `table` set `field` = replace(`field`, 'ß', 'ß');
update `table` set `field` = replace(`field`, 'Ã ', 'à');
update `table` set `field` = replace(`field`, 'á', 'á');
update `table` set `field` = replace(`field`, 'â', 'â');
update `table` set `field` = replace(`field`, 'ã', 'ã');
update `table` set `field` = replace(`field`, 'ä', 'ä');
update `table` set `field` = replace(`field`, 'Ã¥', 'å');
update `table` set `field` = replace(`field`, 'æ', 'æ');
update `table` set `field` = replace(`field`, 'ç', 'ç');
update `table` set `field` = replace(`field`, 'è', 'è');
update `table` set `field` = replace(`field`, 'é', 'é');
update `table` set `field` = replace(`field`, 'ê', 'ê');
update `table` set `field` = replace(`field`, 'ë', 'ë');
update `table` set `field` = replace(`field`, 'ì', 'ì');
# This one looks like it's missing a character, but it's there. 0xad
update `table` set `field` = replace(`field`, 'Ã', 'í');
update `table` set `field` = replace(`field`, 'î', 'î');
update `table` set `field` = replace(`field`, 'ï', 'ï');
update `table` set `field` = replace(`field`, 'ð', 'ð');
update `table` set `field` = replace(`field`, 'ñ', 'ñ');
update `table` set `field` = replace(`field`, 'ò', 'ò');
update `table` set `field` = replace(`field`, 'ó', 'ó');
update `table` set `field` = replace(`field`, 'ô', 'ô');
update `table` set `field` = replace(`field`, 'õ', 'õ');
update `table` set `field` = replace(`field`, 'ö', 'ö');
update `table` set `field` = replace(`field`, '÷', '÷');
update `table` set `field` = replace(`field`, 'ø', 'ø');
update `table` set `field` = replace(`field`, 'ù', 'ù');
update `table` set `field` = replace(`field`, 'ú', 'ú');
update `table` set `field` = replace(`field`, 'û', 'û');
update `table` set `field` = replace(`field`, 'ü', 'ü');
update `table` set `field` = replace(`field`, 'ý', 'ý');
update `table` set `field` = replace(`field`, 'þ', 'þ');
update `table` set `field` = replace(`field`, 'ÿ', 'ÿ');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'á','á');
update `table` set `field` = replace(`field` ,'â','â');
update `table` set `field` = replace(`field` ,'é','é');
update `table` set `field` = replace(`field` ,'í','í');
update `table` set `field` = replace(`field` ,'õ','õ');
update `table` set `field` = replace(`field` ,'ú','ú');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'Á','Á');
update `table` set `field` = replace(`field` ,'Â','Â');
update `table` set `field` = replace(`field` ,'É','É');
update `table` set `field` = replace(`field` ,'Í','Í');
update `table` set `field` = replace(`field` ,'Õ','Õ');
update `table` set `field` = replace(`field` ,'Ú','Ú');
update `table` set `field` = replace(`field` ,'Ç','Ç');
update `table` set `field` = replace(`field` ,'Ã','Ã');
update `table` set `field` = replace(`field` ,'À','À');
update `table` set `field` = replace(`field` ,'Ê','Ê');
update `table` set `field` = replace(`field` ,'Ó','Ó');
update `table` set `field` = replace(`field` ,'Ô','Ô');
update `table` set `field` = replace(`field` ,'Ü','Ü');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'à','à');
update `table` set `field` = replace(`field` ,'ê','ê');
update `table` set `field` = replace(`field` ,'ó','ó');
update `table` set `field` = replace(`field` ,'ô','ô');
update `table` set `field` = replace(`field` ,'ü','ü');
update `table` set `field` = replace(`field` ,'&','&');
update `table` set `field` = replace(`field` ,'>','>');
update `table` set `field` = replace(`field` ,'<','<');
update `table` set `field` = replace(`field` ,'ˆ','ˆ');
update `table` set `field` = replace(`field` ,'˜','˜');
update `table` set `field` = replace(`field` ,'¨','¨');
update `table` set `field` = replace(`field` ,'&cute;','´');
update `table` set `field` = replace(`field` ,'¸','¸');
update `table` set `field` = replace(`field` ,'"','"');
update `table` set `field` = replace(`field` ,'“','“');
update `table` set `field` = replace(`field` ,'”','”');
update `table` set `field` = replace(`field` ,'‘','‘');
update `table` set `field` = replace(`field` ,'’','’');
update `table` set `field` = replace(`field` ,'‹','‹');
update `table` set `field` = replace(`field` ,'›','›');
update `table` set `field` = replace(`field` ,'«','«');
update `table` set `field` = replace(`field` ,'»','»');
update `table` set `field` = replace(`field` ,'º','º');
update `table` set `field` = replace(`field` ,'ª','ª');
update `table` set `field` = replace(`field` ,'–','–');
update `table` set `field` = replace(`field` ,'—','—');
update `table` set `field` = replace(`field` ,'¯','¯');
update `table` set `field` = replace(`field` ,'…','…');
update `table` set `field` = replace(`field` ,'¦','¦');
update `table` set `field` = replace(`field` ,'•','•');
update `table` set `field` = replace(`field` ,'¶','¶');
update `table` set `field` = replace(`field` ,'§','§');
update `table` set `field` = replace(`field` ,'¹','¹');
update `table` set `field` = replace(`field` ,'²','²');
update `table` set `field` = replace(`field` ,'³','³');
update `table` set `field` = replace(`field` ,'½','½');
update `table` set `field` = replace(`field` ,'¼','¼');
update `table` set `field` = replace(`field` ,'¾','¾');
update `table` set `field` = replace(`field` ,'⅛','⅛');
update `table` set `field` = replace(`field` ,'⅜','⅜');
update `table` set `field` = replace(`field` ,'⅝','⅝');
update `table` set `field` = replace(`field` ,'⅞','⅞');
update `table` set `field` = replace(`field` ,'>','>');
update `table` set `field` = replace(`field` ,'<','<');
update `table` set `field` = replace(`field` ,'±','±');
update `table` set `field` = replace(`field` ,'−','−');
update `table` set `field` = replace(`field` ,'×','×');
update `table` set `field` = replace(`field` ,'÷','÷');
update `table` set `field` = replace(`field` ,'∗','∗');
update `table` set `field` = replace(`field` ,'⁄','⁄');
update `table` set `field` = replace(`field` ,'‰','‰');
update `table` set `field` = replace(`field` ,'∫','∫');
update `table` set `field` = replace(`field` ,'∑','∑');
update `table` set `field` = replace(`field` ,'∏','∏');
update `table` set `field` = replace(`field` ,'√','√');
update `table` set `field` = replace(`field` ,'∞','∞');
update `table` set `field` = replace(`field` ,'≈','≈');
update `table` set `field` = replace(`field` ,'≅','≅');
update `table` set `field` = replace(`field` ,'∝','∝');
update `table` set `field` = replace(`field` ,'≡','≡');
update `table` set `field` = replace(`field` ,'≠','≠');
update `table` set `field` = replace(`field` ,'≤','≤');
update `table` set `field` = replace(`field` ,'≥','≥');
update `table` set `field` = replace(`field` ,'∴','∴');
update `table` set `field` = replace(`field` ,'⋅','⋅');
update `table` set `field` = replace(`field` ,'·','·');
update `table` set `field` = replace(`field` ,'∂','∂');
update `table` set `field` = replace(`field` ,'ℑ','ℑ');
update `table` set `field` = replace(`field` ,'ℜ','ℜ');
update `table` set `field` = replace(`field` ,'′','′');
update `table` set `field` = replace(`field` ,'″','″');
update `table` set `field` = replace(`field` ,'°','°');
update `table` set `field` = replace(`field` ,'∠','∠');
update `table` set `field` = replace(`field` ,'⊥','⊥');
update `table` set `field` = replace(`field` ,'∇','∇');
update `table` set `field` = replace(`field` ,'⊕','⊕');
update `table` set `field` = replace(`field` ,'⊗','⊗');
update `table` set `field` = replace(`field` ,'ℵ','ℵ');
update `table` set `field` = replace(`field` ,'ø','ø');
update `table` set `field` = replace(`field` ,'Ø','Ø');
update `table` set `field` = replace(`field` ,'∈','∈');
update `table` set `field` = replace(`field` ,'∉','∉');
update `table` set `field` = replace(`field` ,'∩','∩');
update `table` set `field` = replace(`field` ,'∪','∪');
update `table` set `field` = replace(`field` ,'⊂','⊂');
update `table` set `field` = replace(`field` ,'⊃','⊃');
update `table` set `field` = replace(`field` ,'⊆','⊆');
update `table` set `field` = replace(`field` ,'⊇','⊇');
update `table` set `field` = replace(`field` ,'∃','∃');
update `table` set `field` = replace(`field` ,'∀','∀');
update `table` set `field` = replace(`field` ,'∅','∅');
update `table` set `field` = replace(`field` ,'¬','¬');
update `table` set `field` = replace(`field` ,'∧','∧');
update `table` set `field` = replace(`field` ,'∨','∨');
update `table` set `field` = replace(`field` ,'↵','↵');
Upvotes: 24
Reputation: 23
As the main question was on detecting broken chars my solution: (To prevent normal charset double encoding)
SELECT name FROM %table%
WHERE
CONVERT(CONVERT(name USING BINARY) USING utf8 ) != CONVERT(CONVERT(CONVERT(CONVERT(name USING BINARY) USING latin1) USING BINARY) USING utf8);
UPDATE %table% SET name = convert(cast(convert(name using latin1 ) as binary) using utf8 )
WHERE
CONVERT(CONVERT(name USING BINARY) USING utf8 ) != CONVERT(CONVERT(CONVERT(CONVERT(name USING BINARY) USING latin1) USING BINARY) USING utf8);
Upvotes: 0
Reputation: 731
Based on data in this post https://www.i18nqa.com/debug/utf8-debug.html I'd suggest this is a good query of identifying dodgy entries and possible correct values:
SELECT my_field,CONVERT(BINARY CONVERT(my_field USING latin1) USING utf8mb4) AS new_field_value FROM my_table WHERE my_field REGEXP '[âÆËÅÂÃ]';
Be very careful because we had a bad encoding of a file name, but an OK encoding of the path, and in that case some of the solutions above would have caused a world of pain. If some of your data is already correctly encoded in UTF8 you'll likely find you lost a chunk of it.
Upvotes: 1
Reputation: 14747
This is an extension of @Thales Ceolin's answer in order to modify every table in the db:
select concat(
"update ",
a.TABLE_NAME,
" set ", b.COLUMN_NAME,
" = CONVERT(BINARY CONVERT(",
b.COLUMN_NAME,
" USING latin1) USING utf8) where ",
b.COLUMN_NAME,
" is not null;") query
from INFORMATION_SCHEMA.TABLES a
left join INFORMATION_SCHEMA.COLUMNS b on a.TABLE_NAME = b.TABLE_NAME
where a.table_schema = 'db_name'
and a.TABLE_TYPE = 'BASE TABLE'
and b.data_type in ('text', 'varchar')
and a.TABLE_NAME = 'table_name';
This will result in:
update table_name set idn = CONVERT(BINARY CONVERT(idn USING latin1) USING utf8) where idn is not null;
update table_nameset name = CONVERT(BINARY CONVERT(name USING latin1) USING utf8) where name is not null;
update table_name set primary_last_name = CONVERT(BINARY CONVERT(primary_last_name USING latin1) USING utf8) where primary_last_name is not null;
Upvotes: 1
Reputation: 101
You might have rows with properly encoded UTF8 and with wrongly encoded characters. In this case "CONVERT(BINARY CONVERT(post_title USING latin1) USING utf8)" will trim some fields.
I ended up doing it this way
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ä" USING latin1),'ä');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ö" USING latin1),'ö');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ü" USING latin1),'ü');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ä" USING latin1),'Ä');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ö" USING latin1),'Ö');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ü" USING latin1),'Ü');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ß" USING latin1),'ß');
Upvotes: 2
Reputation: 21
this also solved my problem on some italian chars
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'á','á');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ä','ä');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'é','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ó','ó');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ú','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ñ','ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'–','–');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'’','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'…','...');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'–','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'“','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‘','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'•','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‡','c');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í ','à');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í¨','è');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íˆ','È');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'€','€');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'eÌ€','è');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í²','ò');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í¹','ù');
Upvotes: 2
Reputation: 519
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'á','á');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ä','ä');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'é','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ó','ó');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ú','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ñ','ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'–','–');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'’','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'…','...');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'–','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'“','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‘','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'•','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‡','c');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');
Upvotes: 41
Reputation: 2600
This saved my life
UPDATE ohp_posts SET post_content = CONVERT(CAST(CONVERT(post_content USING latin1) AS BINARY) USING utf8)
I've found it here http://stanis.net/2014/04/replacing-latin-1-with-utf-8-characters-in-mysql/
Upvotes: 16
Reputation: 17215
In addition to Raúl Avila Solano and acseven's answer if you want to update all the broken characters in one query you can do:
update `table` set field = replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(field,'ü','ü'),'ô','ô'),'ó','ó'),'ê','ê'),'à','à'),'ã','ã'),'Ü','Ü'),'Ô','Ô'),'Ó','Ó'),'Ê','Ê'),'À','À'),'Ã','Ã'),'Ç','Ç'),'Ú','Ú'),'Õ','Õ'),'Í','Í'),'Í','Í'),'É','É'),'Â','Â'),'Á','Á'),'ç','ç'),'ú','ú'),'õ','õ'),'í','í'),'é','é'),'â','â'),'á','á'),'ã','ã'),'ç','ç'),'à ','à'),'à ','à'),'º','º'),'ª','ª'),'ç','ç'),'–','–'),'ó','ó'),'é','é'),'á','á'),'ê','ê'),'ã','ã'),'â','â'),'Ã','í'),'õ','õ'),'Ø','Ø'),'•','-'),'ú','ú'),'à ','À'),'Ã','Ã'),'Ç','Ç'),'â€','"'),'“','"'),'É','É');
Upvotes: 2
Reputation: 2644
I fixed with
UPDATE wp_zcs9ck_posts_copy SET post_title =
CONVERT(BINARY CONVERT(post_title USING latin1) USING utf8);
Complete solution: http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/
Upvotes: 89
Reputation: 1
The middle step may not have been necessary since TEXT
and BLOB
are the same.
This had the effect of correcting all incorrectly encoded data but data which was previously correct was truncated at the first multibyte character.
Upvotes: 0
Reputation: 2282
I had this same problem but didn't like the replace() solution because there's always the possibility of missing some characters. I was working against a column with mixed data (some had been utf8_encode()d and some not) with 4 million or so rows, about 250k records with mis-encoded data (with É/etc characters), covering about 15 international languages, including mainly European languages but also Russian, Japanese and Chinese.
I started by duplicating the column, since I didn't want to lose any data:
ALTER TABLE images ADD COLUMN reptitle TEXT;
Copied all the data with multibyte characters (thanks Adam for the tip)
UPDATE images SET reptitle = title WHERE LENGTH(title) != CHAR_LENGTH(title)
Since reptitle was created with the table's default character set it was already utf8, but contained the corrupted data since images table used to be a latin source. Column reptitle now contains some data which is correctly encoded, and some corrupted (all values with multibyte characters, some had been correctly utf8_encode()d. So then with David's tip...
ALTER TABLE images MODIFY reptitle TEXT character set latin1;
ALTER TABLE images MODIFY reptitle BLOB;
ALTER TABLE images MODIFY reptitle TEXT character set utf8;
The middle step may not have been necessary since TEXT and BLOB (I think) are the same. This had the effect of correcting all incorrectly encoded data ('étudiantes' became 'étudiantes', etc) but data which was previously correct was truncated at the first multibyte character ('Lapin de Pâques' became 'Lapin de P'). I don't know why the truncation, but it's in a disposable column so I didn't care. The truncated data gives CHAR_LENGTH and LENGTH of the same values because there are no multi-byte characters remaining so easy query...
UPDATE images SET title = reptitle WHERE LENGTH(reptitle)!=CHAR_LENGTH(reptitle)
Then of course just drop the spare column
ALTER TABLE images DROP COLUMN reptitle
Also make sure (since I use PHP and this had tripped me up a couple of times so I thought I'd mention it here) all your script files are UTF8 (without BOM) and you are using:
mysql_set_charset('utf8', $connection);
Et voilà... perfectly repaired data, all languages :)
Upvotes: 3
Reputation: 4110
No text replacement is a universal solutions because you can forget some character. A more suitable fix for double converted characters is:
Like this:
alter table descriptions modify name VARCHAR(2000) character set latin1;
alter table descriptions modify name blob;
alter table descriptions modify name VARCHAR(2000) character set utf8;
Upvotes: 17
Reputation: 3369
The SELECT
statement you need is the following:
SELECT * FROM TABLE WHERE LENGTH(name) != CHAR_LENGTH(name);
This returns all rows which contain multi-byte characters.
name
is assumed to be a field / the field where weird characters would be found. *
Upvotes: 16