Reputation: 6008
I need help with a character encoding problem that I want to sort once and for all. Here is an example of some content which I pull from a XML feed, insert into my database and then pull out.
As you can not see, a lot of special html characters get corrupted/broken.
How can I once and for all stop this? How am I able to support all types of characters, etc.?
I've tried literally every piece of coding I can find, it sometimes corrects it for most but still others are corrupted.
Upvotes: 6
Views: 11592
Reputation: 25157
Did you try utf8_encode()
and utf8_decode()
?
Which one you use will depend entirely on how your data is encoded, which you don't specify, but they are quite useful for this kind of cases.
Upvotes: 0
Reputation: 774
To absolutely once and for all make sure you will never have problems with encoding again:
Use UTF-8 everywhere and on everything!
That is (if you use mysql and php):
Have the following meta tag in the section of your HTML documents:
<meta http-equiv="content-type" content="text/html; charset=utf-8">
And couple of bonus tips:
OR:
You can just use one simple server side configuration file that takes care of all encoding stuff. In this case you wont need header and/or meta tags at all or php.ini file modification. Just add your wanted character set encoding to .htaccess file and put it into your www root. If you want to fiddle with character set strings and use your php code for that - thats another story. Database collation must ofcourse be correct.
Footnote: UTF-8 is not the encoding solution its an a solution. It doesn't matter what character set/encoding one is using as long as the used environment has been taking to consideration.
Upvotes: 12
Reputation: 123
After you connect to the database, but before you do any transactions, execute the following line which makes sure all database communication is in UTF-8:
mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $dbconn);
Upvotes: 1
Reputation: 483
It looks like the link you gave has data that is encoded in utf-8. (Follow that link, then change the encoding of your browser to utf-8).
I sounds like you are having problems with inserting and retrieving from your database. Make sure your database table has utf-8 set as the encoding.
Upvotes: 1
Reputation: 6008
header('Content-type: text/html; charset=UTF-8') ;
/**
* Encodes HTML safely for UTF-8. Use instead of htmlentities.
*
* @param string $var
* @return string
*/
function html_encode($var)
{
return htmlentities($var, ENT_QUOTES, 'UTF-8');
}
Those two rescued me and I think it is now working. I'll come back if I continue to encounter problems. Should I store it in the DB, eg as "&" or as "&"?
Upvotes: 0
Reputation: 655825
It seems that an UTF-8 encoded text is interpreted with ISO 8859-1.
If you’re processing XML documents, you have to use the encoding given either in the charset
parameter in HTTP header field Content-Type
or in the encoding
attribute in the XML declaration. If none of both is given, the XML specification declares UTF-8 or UTF-16 as the default character encoding and you have to use some detection.
Upvotes: 1
Reputation: 13903
First off, make sure your database's character encoding is set to support UTF-8. Secondly, PHP's ICONV is going to be your friend. Finally, ensure that your response headers are sending the proper character encoding (again, UTF-8).
Upvotes: 0
Reputation: 464
My favorite article about encodings from JoelOnSoftware: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets
Upvotes: 3