Reputation: 3796
I am trying to use below query to enforce UTF-8 encoding on a column data.
SELECT convert(binary convert(summary using latin1) using utf8) FROM blogs b;
The below warnings getting logged in mysql console
What are below character represent in the above warnings?
'AE2065','927320','927320','927320','F16120','C02070','AE2043','F16F6C','93426F','93426F','E4646F','E46E73','E46DE4','E47974','C02070','934355','934355'
and how can I find out corresponding actual character symbols for above list?
Upvotes: 1
Views: 5541
Reputation: 142298
Well, given these source character sets, you get this text. For example, several treat AE
as ®
CONVERT(CONVERT(UNHEX('AE2065') USING %s) USING utf8mb4):
cp1250, cp1251, cp1256, cp1257, geostd8, hebrew, latin1, latin5, latin7
4 3 '® e'
cp850, cp852, keybcs2 4 3 '« e'
cp866 4 3 'о e'
hp8 4 3 'Û e'
latin2 4 3 'Ž e'
macce 4 3 'ģ e'
macroman 4 3 'Æ e'
armscii8 5 3 '… e'
cp932, sjis 5 3 'ョ e'
koi8r, koi8u 5 3 '╝ e'
tis620 5 3 'ฎ e'
For the 92
, a likely interpretation is ’
CONVERT(CONVERT(UNHEX('927320') USING %s) USING utf8mb4)
cp932, sjis 4 2 '痴 '
euckr 4 2 '뭩 '
gbk 4 2 '抯 '
cp850 4 3 'Æs '
cp852 4 3 'ĺs '
cp866 4 3 'Тs '
keybcs2 4 3 'Žs '
macce, macroman 4 3 'ís '
cp1250, cp1251, cp1256, cp1257, geostd8, latin1
5 3 '’s '
So far, I see that latin1
is common. I'll try another:
CONVERT(CONVERT(UNHEX('F16120') USING %s) USING utf8mb4)
big5 4 2 '鎙 '
cp932 4 2 ' '
gbk 4 2 '馻 '
armscii8 4 3 'րa '
cp1250, cp1257, dec8, latin1, latin2, latin5, latin7
4 3 'ña '
cp1251 4 3 'сa '
cp850, keybcs2 4 3 '±a '
cp852 4 3 '˝a '
cp866 4 3 'ёa '
greek 4 3 'ρa '
hebrew 4 3 'סa '
hp8 4 3 'þa '
koi8r, koi8u 4 3 'Яa '
macce 4 3 'Ůa '
macroman 4 3 'Òa '
tis620 5 3 '๑a '
Again, latin1 seems likely.
So,... What you need to do is declare that the bytes that you are feeding to MySQL are encoded in latin1
. Then they will be automatically converted to the declaration of the column, which should be utf8
(or utf8mb4
if you want Chinese and Emoji).
Since you seem to already have latin1 bytes in the table (in column summary
), it is not a question of whether you have utf8 or not; you do not have utf8. And the column is not declared utf8.
So, what do you really want do do?? Perhaps you want to ALTER
the table to change the character set to utf8? See
ALTER TABLE ... CONVERT TO ...
What you had was close:
CONVERT(BINARY CONVERT(UNHEX('AE2065') USING latin1) USING utf8) --> Warning
CONVERT( CONVERT(UNHEX('AE2065') USING latin1) USING utf8) --> '® e'
But, then what will you do with it? If the column is latin1, and you stick the converted stuff back in, you will get "double encoding", making things worse.
Perhaps you also have utf8-encoded characters sitting in a latin1 column? That's a mess -- "double encoding".
See Trouble with UTF-8 characters; what I see is not what I stored then further elaborate on your Question.
Upvotes: 2
Reputation: 7266
I am guessing that your data is stored as latin-1
charset. You could convert it as following:
SELECT convert(cast(convert(summary using latin1) as binary) using utf8) FROM blogs b;
Did you actually intend to store your data as latin-1
? If you intend to store your summary
as UTF-8
you could update your table as follows:
UPDATE blogs SET
summary=convert(cast(convert(summary using latin1) as binary) using utf8)
WHERE 1
Upvotes: 0