facebook-1628410276
facebook-1628410276

Reputation: 31

Arabic texts in php/mysql sometimes appear “???” and sometime appear "Ùؤتا" after select/insert statement

I have an ongoing project where I need to fetch Arabic texts from mysql table and also insert/update them time to time. I have my database collation in "utf8_general_ci".

At first I found question marks "???" upon fetching some of the arabic data. Then I have executed "SET CHARACTER SET utf8". The question mark problem of that particular problem was solved, but then other arabic data started showing gibberish "Ùؤتا". In the project I also need to fetch some data from csv containing arabic texts.

Here is the json data I found before and after the charset execution:

[{
  "id": 148,
  "domain": 0,
  "group_name": "ATX ??????????",
  "score": 0,
  "player_name": "لاعب واحد",
  "created_at": "2015-10-26 13:01:23"
},
{
  "id": 148,
  "domain": 0,
  "group_name": "???? ???????",
  "score": 1,
  "player_name": "اثنين من لاعب",
  "created_at": "2015-10-26 12:59:57"
}]

// ---------------------------------------
// After executing "SET CHARACTER SET utf8"
// ---------------------------------------  


[{
  "id": 148,
  "domain": 0,
  "group_name": "ATX توكوروزاوا",
  "score": 0,
  "player_name": "مؤتا",
  "created_at": "2015-10-26 13:01:23"
},
{
  "user_id": 148,
  "domain": 0,
  "group_name": "لندن دينيموز",
  "score": 1,
  "player_name": "كابوا",
  "created_at": "2015-10-26 12:59:57"
}]

Can anyone tell me what is wrong here? I'm trying to fix this but can't find any solution.

Upvotes: 3

Views: 368

Answers (2)

Rick James
Rick James

Reputation: 142208

مؤتا is Mojibake for مؤتا:

  • The bytes you have in the client are correctly encoded in utf8 (good).
  • You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
  • The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.

لاعب may be a "double encoding" -- avoid that path.

"ATX ??????????" --

utf8 needs to be established in about 4 places.

  • The column(s) in the database -- Use SHOW CREATE TABLE to verify that they are explicitly set to utf8, or defaulted from the table definition. (It is not enough to change the database default.)
  • The connection between the client and the server. See SET NAMES utf8.
  • The bytes you have. (This is probably the case.)
  • If you are displaying the text in a web page, check the <meta> tag.

See also UTF-8 all the way through

Upvotes: 2

Francis Eytan Dortort
Francis Eytan Dortort

Reputation: 1447

It looks like you have text with 2 different character encodings in your database, utf8 and, I'm guessing, latin1. You'll have to decide on which to use definitely (I suggest utf8) and update the text in the other encoding to match.

Try something like this to test for the correct encoding:

SELECT group_name, CONVERT(player_name USING utf8) FROM your_table;

If the output is correct, you can then correct the data permanently with:

UPDATE your_table SET player_name = CONVERT(player_name USING utf8);

See https://dev.mysql.com/doc/refman/5.0/en/charset-convert.html

Upvotes: 0

Related Questions