Liam Bailey
Liam Bailey

Reputation: 5905

iso-8895-1 to xml acceptable UTF-8

I am parsing text/html from web pages into an xml feed, the text/html is encoded iso-8895-1 while the XML feed must be UTF-8. I have used html entities, but am having to manually replace loads of characters, here is what I have so far (still not parsing all text)

$desc = str_replace(array("\n", "\r", "\r\n"),"",$desc);
    $desc = str_replace(array("’","‘","”","“"),"'",$desc);
  $desc = str_replace("£","£",$desc);
    $desc = str_replace("é","é",$desc);
    $desc = str_replace("²","2",$desc);
    $desc = str_replace(array("-","•"),"‐",$desc);
$desc = htmlentities($desc, ENT_QUOTES, "UTF-8");

Upvotes: 0

Views: 1754

Answers (2)

Pekka
Pekka

Reputation: 449465

Use iconv(). It will allow you to use native characters in UTF-8 as well - no need for HTML entities.

$data = iconv("ISO-8859-1", "UTF-8", $text);

when doing encoding from UTF-8 to another character set, use IGNORE or TRANSLIT to drop or transliterate non-translatable characters.

alternatively, the mb_* functions shown by @Gumbo will work as well.

Upvotes: 5

Gumbo
Gumbo

Reputation: 655239

You can also use utf8_encode or mb_convert_encoding:

$desc = utf8_encode($desc);
// OR
$desc = mb_convert_encoding($dest, 'UTF-8', 'ISO-8859-1');

Both will convert the encoding from ISO 8859-1 to UTF-8.

Upvotes: 1

Related Questions