Reputation: 13
In addition to my previous question about parsing images and text from complex xml, only problem about that now is that i don't get the right encoding. Text is in greek, the xml
file has utf-8
encoding.
This is the code to parse xml:
$xml = simplexml_load_file('myfile.xml');
$descriptions = $xml->xpath('//item/description');
foreach ( $descriptions as $description_node ) {
$description_dom = new DOMDocument();
$description_dom->loadHTML( (string)$description_node );
$description_sxml = simplexml_import_dom( $description_dom );
$imgs = $description_sxml->xpath('//img');
$text = $description_sxml->xpath('//div');
foreach($imgs as $image){
echo (string)$image['src'];
}
foreach($text as $t){
echo (string)$t;
}
}
If i echo $description_node
,text looks fine, but after i get $description_dom
with simplexml_import_dom
it looks like this:
Ïε ιÏλαμικÎÏ ÎºÎ¿Î¹Î½ÏÏηÏεÏ.
Using mb_convert_encoding
turns it to:
ýÃÂñù" ÃÂ
. What am i doing wrong?
Upvotes: 0
Views: 971
Reputation: 13
Solution: after $description_dom = new DOMDocument();
, i placed this code.
$description_html = mb_convert_encoding($description_node, 'HTML-ENTITIES', "UTF-8");
Simply converts html entities
to UTF-8
. Instead of
$description_dom->loadHTML( (string)$description_node );
now i load the converted html
$description_dom->loadHTML( (string)$description_html );
Upvotes: 1
Reputation: 140234
Do not convert anything.. just print it with proper declaration
header("Content-Type: text/plain; charset=utf-8");
This is all you need to do. Do it at the top of your file.
Upvotes: 0
Reputation: 119
Add this to the head of your HTML page where you want the text to be displayed :
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
This should render the characters properly.
Upvotes: 0