DanielZuo
DanielZuo

Reputation: 33

UTF8 encoded strings not shown correctly in MySQL

So I have programmed a crawler to scrape information and data from a website with charset utf8. But when I tried to store the contents into MySQL, some special characters, such as Spanish letters), did not show correctly in MySQL.

Here is what I have done:

  1. Put header("Content-Type: text/html; charset=utf-8") in PHP
  2. Set all charset in MySQL into utf8-unicode-ci
  3. Have $conn->query("SET NAMES 'utf8'") this upon connection
  4. Double checked that the html I parsed was encoded in utf-8

So what are some potentially problems here?

Upvotes: 2

Views: 472

Answers (3)

ggenglish
ggenglish

Reputation: 1678

I remember pulling my hair out in dealing with UTF8 issues until I started adding this to my header:

setlocale(LC_ALL, 'en_US.UTF-8');

Upvotes: 0

Luca Borrione
Luca Borrione

Reputation: 17032

Maybe you coded your crawler using functions which are not supposed to manage multi-byte characters.
For example strlen instead of mb_strlen.

Try putting:

mb_internal_encoding("UTF-8");

as first line of your php coce, and then check if you have to convert some functions in their respective mb version. Have a look at multibyte string reference

As a last chance you may play with iconv function just before inserting the string into mysql.
Something as:

$utf8_string = iconv(iconv_get_encoding($string), "UTF-8", $string);

should do the trick

Upvotes: 1

troelskn
troelskn

Reputation: 117615

Start by checking if the data is stored wrong in the database, in which case the problem is with your crawler. Otherwise the problem is in your presentation.

To test this, I would suggest that you use a dedicated mysql client (Such as the command line client) to inspect data.

Upvotes: 1

Related Questions