Reputation: 31
I have an ongoing project where I need to fetch Arabic texts from mysql table and also insert/update them time to time. I have my database collation in "utf8_general_ci".
At first I found question marks "???" upon fetching some of the arabic data. Then I have executed "SET CHARACTER SET utf8". The question mark problem of that particular problem was solved, but then other arabic data started showing gibberish "Ùؤتا". In the project I also need to fetch some data from csv containing arabic texts.
Here is the json data I found before and after the charset execution:
[{
"id": 148,
"domain": 0,
"group_name": "ATX ??????????",
"score": 0,
"player_name": "لاعب واحد",
"created_at": "2015-10-26 13:01:23"
},
{
"id": 148,
"domain": 0,
"group_name": "???? ???????",
"score": 1,
"player_name": "اثنين من لاعب",
"created_at": "2015-10-26 12:59:57"
}]
// ---------------------------------------
// After executing "SET CHARACTER SET utf8"
// ---------------------------------------
[{
"id": 148,
"domain": 0,
"group_name": "ATX توكوروزاوا",
"score": 0,
"player_name": "مؤتا",
"created_at": "2015-10-26 13:01:23"
},
{
"user_id": 148,
"domain": 0,
"group_name": "لندن دينيموز",
"score": 1,
"player_name": "كابوا",
"created_at": "2015-10-26 12:59:57"
}]
Can anyone tell me what is wrong here? I'm trying to fix this but can't find any solution.
Upvotes: 3
Views: 368
Reputation: 142208
مؤتا
is Mojibake for مؤتا
:
SET NAMES latin1
(or set_charset('latin1')
or ...), probably by default. (It should have been utf8
.)CHARACTER SET utf8
, but it should have been that.لاعب
may be a "double encoding" -- avoid that path.
"ATX ??????????"
--
utf8 needs to be established in about 4 places.
SHOW CREATE TABLE
to verify that they are explicitly set to utf8, or defaulted from the table definition. (It is not enough to change the database default.)SET NAMES utf8
.<meta>
tag.See also UTF-8 all the way through
Upvotes: 2
Reputation: 1447
It looks like you have text with 2 different character encodings in your database, utf8 and, I'm guessing, latin1. You'll have to decide on which to use definitely (I suggest utf8) and update the text in the other encoding to match.
Try something like this to test for the correct encoding:
SELECT group_name, CONVERT(player_name USING utf8) FROM your_table;
If the output is correct, you can then correct the data permanently with:
UPDATE your_table SET player_name = CONVERT(player_name USING utf8);
See https://dev.mysql.com/doc/refman/5.0/en/charset-convert.html
Upvotes: 0