Reputation: 5905
I am parsing text/html from web pages into an xml feed, the text/html is encoded iso-8895-1 while the XML feed must be UTF-8. I have used html entities, but am having to manually replace loads of characters, here is what I have so far (still not parsing all text)
$desc = str_replace(array("\n", "\r", "\r\n"),"",$desc);
$desc = str_replace(array("’","‘","”","“"),"'",$desc);
$desc = str_replace("£","£",$desc);
$desc = str_replace("é","é",$desc);
$desc = str_replace("²","2",$desc);
$desc = str_replace(array("-","•"),"‐",$desc);
$desc = htmlentities($desc, ENT_QUOTES, "UTF-8");
Upvotes: 0
Views: 1754
Reputation: 449465
Use iconv()
. It will allow you to use native characters in UTF-8 as well - no need for HTML entities.
$data = iconv("ISO-8859-1", "UTF-8", $text);
when doing encoding from UTF-8 to another character set, use IGNORE or TRANSLIT to drop or transliterate non-translatable characters.
alternatively, the mb_*
functions shown by @Gumbo will work as well.
Upvotes: 5
Reputation: 655239
You can also use utf8_encode
or mb_convert_encoding
:
$desc = utf8_encode($desc);
// OR
$desc = mb_convert_encoding($dest, 'UTF-8', 'ISO-8859-1');
Both will convert the encoding from ISO 8859-1 to UTF-8.
Upvotes: 1