James
James

Reputation: 6008

PHP character encoding problems

I need help with a character encoding problem that I want to sort once and for all. Here is an example of some content which I pull from a XML feed, insert into my database and then pull out.

As you can not see, a lot of special html characters get corrupted/broken.

How can I once and for all stop this? How am I able to support all types of characters, etc.?

I've tried literally every piece of coding I can find, it sometimes corrects it for most but still others are corrupted.

Upvotes: 6

Views: 11592

Answers (8)

Seb
Seb

Reputation: 25157

Did you try utf8_encode() and utf8_decode()?

Which one you use will depend entirely on how your data is encoded, which you don't specify, but they are quite useful for this kind of cases.

Upvotes: 0

Petrunov
Petrunov

Reputation: 774

To absolutely once and for all make sure you will never have problems with encoding again:

Use UTF-8 everywhere and on everything!

That is (if you use mysql and php):

  • Set all the tables in your database to collation "utf8_general_ci" for example.
  • Once you establish the database connection, run the following SQL query: "SET NAMES 'utf8'"
  • Always make sure the settings of your editor are set to UTF-8 encoding.
  • Have the following meta tag in the section of your HTML documents:

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

And couple of bonus tips:

OR:

You can just use one simple server side configuration file that takes care of all encoding stuff. In this case you wont need header and/or meta tags at all or php.ini file modification. Just add your wanted character set encoding to .htaccess file and put it into your www root. If you want to fiddle with character set strings and use your php code for that - thats another story. Database collation must ofcourse be correct.

Footnote: UTF-8 is not the encoding solution its an a solution. It doesn't matter what character set/encoding one is using as long as the used environment has been taking to consideration.

Upvotes: 12

Christian
Christian

Reputation: 123

After you connect to the database, but before you do any transactions, execute the following line which makes sure all database communication is in UTF-8:

mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $dbconn);

Upvotes: 1

John
John

Reputation: 483

It looks like the link you gave has data that is encoded in utf-8. (Follow that link, then change the encoding of your browser to utf-8).

I sounds like you are having problems with inserting and retrieving from your database. Make sure your database table has utf-8 set as the encoding.

Upvotes: 1

James
James

Reputation: 6008

header('Content-type: text/html; charset=UTF-8') ;

/**
 * Encodes HTML safely for UTF-8. Use instead of htmlentities. 
 *
 * @param string $var 
 * @return string 
 */
function html_encode($var)
{
    return htmlentities($var, ENT_QUOTES, 'UTF-8');
}

Those two rescued me and I think it is now working. I'll come back if I continue to encounter problems. Should I store it in the DB, eg as "&" or as "&"?

Upvotes: 0

Gumbo
Gumbo

Reputation: 655825

It seems that an UTF-8 encoded text is interpreted with ISO 8859-1.

If you’re processing XML documents, you have to use the encoding given either in the charset parameter in HTTP header field Content-Type or in the encoding attribute in the XML declaration. If none of both is given, the XML specification declares UTF-8 or UTF-16 as the default character encoding and you have to use some detection.

Upvotes: 1

Jordan S. Jones
Jordan S. Jones

Reputation: 13903

First off, make sure your database's character encoding is set to support UTF-8. Secondly, PHP's ICONV is going to be your friend. Finally, ensure that your response headers are sending the proper character encoding (again, UTF-8).

Upvotes: 0

Paul G.
Paul G.

Reputation: 464

My favorite article about encodings from JoelOnSoftware: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets

Upvotes: 3

Related Questions