pano
pano

Reputation: 13

Convert parsed text, with php, to utf-8

In addition to my previous question about parsing images and text from complex xml, only problem about that now is that i don't get the right encoding. Text is in greek, the xml file has utf-8 encoding. This is the code to parse xml:

$xml = simplexml_load_file('myfile.xml');

$descriptions = $xml->xpath('//item/description');

foreach ( $descriptions as $description_node ) {

    $description_dom = new DOMDocument();
    $description_dom->loadHTML( (string)$description_node );

    $description_sxml = simplexml_import_dom( $description_dom );

    $imgs = $description_sxml->xpath('//img');
    $text = $description_sxml->xpath('//div');

    foreach($imgs as $image){

    echo (string)$image['src'];     
       }

    foreach($text as $t){
    
        echo (string)$t;
       }
    }

If i echo $description_node,text looks fine, but after i get $description_dom with simplexml_import_domit looks like this: Ïε ιÏÎ»Î±Î¼Î¹ÎºÎ­Ï ÎºÎ¿Î¹Î½ÏÏηÏεÏ.Using mb_convert_encoding turns it to: ýÃÂñù" ÃÂ. What am i doing wrong?

Upvotes: 0

Views: 971

Answers (3)

pano
pano

Reputation: 13

Solution: after $description_dom = new DOMDocument(); , i placed this code.

$description_html = mb_convert_encoding($description_node, 'HTML-ENTITIES', "UTF-8");

Simply converts html entities to UTF-8. Instead of

$description_dom->loadHTML( (string)$description_node );

now i load the converted html

$description_dom->loadHTML( (string)$description_html );

Upvotes: 1

Esailija
Esailija

Reputation: 140234

Do not convert anything.. just print it with proper declaration

header("Content-Type: text/plain; charset=utf-8");

This is all you need to do. Do it at the top of your file.

Upvotes: 0

user1362916
user1362916

Reputation: 119

Add this to the head of your HTML page where you want the text to be displayed :

<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>

This should render the characters properly.

Upvotes: 0

Related Questions