Reputation: 207
I've been using for a long time a database/connection with the wrong encoding, resulting the hebrew language characters in the database to display as unknown-language characters, as the example shows below:
I want to re-import/change the database with the inserted-wrong-encoded characters to the right encoded characters, so the hebrew characters will be displayed as hebrew characters and not as unknown parse like *"× ×תה מסכי×,×× ×©×™× ×ž×¦×™×¢×™× ×œ×™ כמה ×”× "*
For the record, when I display this unknown characters sql data with php - it shows as hebrew. when I'm trying to access it from the phpMyAdmin Panel - it shows as jibrish (these unknown characters).
Is there any way to fix it although there is some data already inserted in the database?
Upvotes: 0
Views: 837
Reputation: 142538
That feels like "double-encoded" Hebrew strings.
This partially recovers the text:
UNHEX(HEX(CONVERT('× ×תה מסכי×,××' USING latin1)))
--> '� �תה מסכי�,��
I do not know what leads to the �
symbols.
Please do SELECT col, HEX(col) FROM ... WHERE ...;
for some cell. I would expect שלום
to give hex D7A9D79CD795D79D
if it were correctly stored. For "double encoding", I would expect C397C2A9C397C593C397E280A2C397C29D
.
Please provide the output from that SELECT, then I will work on how to recover the data.
Edit
Here's what I think happened.
SET NAMES latin1
lied by claiming that the client had latin1 encoding; andCHARACTER SET utf8
.Yod did not jump out as a letter, so it took a while to see it. CONVERT(BINARY(CONVERT('×™×™123' USING latin1)) USING utf8)
-->יי123
So, I am thinking that that expression will clean up the text. But be cautious; try it on a few rows before 'fixing' the entire table.
UPDATE table SET col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8) WHERE ...;
If that does not work, here are 4 fixes for double-encoding that may or may not be equivalent. (Note: BINARY(xx)
is probably the same as CONVERT(xx USING binary)
.)
Upvotes: 1
Reputation: 3176
I am not sure that you can do anything about the data that has already been stored in the database. However, you can import hebrew data properly by making sure you have the correct character set and collation.
utf8_general_ci
utf8_general_ci
for example:
CREATE DATABASE col CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE `col`.`hebrew` (
`id` INT NOT NULL AUTO_INCREMENT,
`heb` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`)
) CHARACTER SET utf8
COLLATE utf8_general_ci;
INSERT INTO hebrew(heb) values ('שלום');
Upvotes: 0