user3783604
user3783604

Reputation: 19

Retrieve Arabic script from mysql 5.7.28

I work on an old website with mysql and php. The database is utf8 and collation latin1_swedish_ci. I changed one table and its columns to utf8_general_ci and was able to put arabic script into the db. Then I added

mysql_query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");
mysql_query("SET CHARACTER SET utf8");

to the php code.

Now I was able to get some arabic out. But somme characters are missing. enter image description here

I can't find someone with the same problem and I am at the end of my capabilities.

To answer your questions:

Inside the head I got this:

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

The database holds the right information in the right way: على بعد حوالي 1 كم شمال قبة الهواء، على الضفة الغربية لنهر النيل، صادف العمال جبانة غير معروفة أثناء توسع قرية غرب أسوان النوبية. سُرقت بعض مقابر هذة الجبانة بعد وقت قصير من اكتشافها. حتى أنه في بعض الحالات تم استخدام معدات كبيرة مثل اللوادر لأقتلاع العناصر المعمارية، ومنها على سبيل المثال عتب مدخل من أحدى المقابر. وفي مطلع عام 2013 كانت هناك تقارير متزايدة في الصحافة المصرية والدولية حول اكتشاف هذة المقابر وبداية تدميرها.

This is the same text as above. Only a few arabic characters are weird. Some come out correctly.

So it seems to be happening when I query the database.

if(!@mysql_connect($_SESSION['hostname'],$_SESSION['username'],$_SESSION['password'])){
    echo("<p>Zugangsdaten falsch.xxxx</p>");
  exit();
}
if(!@mysql_select_db($_SESSION['dbname'])){
       echo("<p>Verbindung zum Datenbankserver zur Zeit nicht möglich.</p>");
        exit();
}

mysql_query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");
    mysql_query("SET CHARACTER SET utf8");
    $alle_texte=@mysql_query("SELECT folge, id, bild01, bild02, bild03, bild04, bild05, bild06, bild07, bild08, bild09, bildclass, ".$text." as text, ".$dat_h2.", ".$dat_h3.", ".$dat_h4.", ".$dat_h5.", ".$dat_h6.", link01, link02, link03, link04, link05, link06, link07, link08, link09, link10  FROM ".$db_tabelle." WHERE folge >= ".$folge_min." AND zeigen = '1' AND folge <= ".$folge_max." ORDER BY folge");

BTW, after I put in the "SET NAMES 'utf8' COLLATE 'utf8_general_ci'" I got rid of the "????" for all the arabic characters. This problem seems to be qell documented. But I only got half way to the solution, it seems.

Upvotes: 0

Views: 565

Answers (1)

Rick James
Rick James

Reputation: 142208

The database is utf8 and collation latin1_swedish_ci. I changed one table and its columns to utf8_general_ci and was able to put arabic script into the db. Then I added

Some of that is relevant, but there are missing pieces. Please provide

  • SHOW CREATE TABLE before the change.
  • The SQL statement(s) you used to make the change.
  • SHOW CREATE TABLE after the change.

The settings on the database are not relevant.

Also, please provide this for a small amount of Arabic text:

SELECT col, HEX(col) FROM ... WHERE ...

For example, if the column has على بعد حوالي, the HEX will say

D8B9 D984 D989 D8A8 D8B9 D8AF D8AD D988 D8A7 D984 D98A

(I added the spacing to emphasize that Arabic is D8xx or D9xx -- 2 bytes per Arabic character.)

If, instead, the hex is something like

C398 C2B9 C399 E2809 EC399 E280B0 ...

Then the data it the table is already messed up.

Since your output has some Arabic and some black diamonds, I suspect things are even worse off. I discuss black diamonds in Trouble with UTF-8 characters; what I see is not what I stored , but your use case may be worse than the cases that it handles.

Upvotes: 1

Related Questions