Reputation: 610
I'm handling UTF8 data from an UTF8 database and I'm having trouble with UTF8 encoding.
Raw content is extracted correctly fron the DB and I do see "é" in my UTF8 terminal:
Site de la Préfecture de Police
Then, when I pass this content through PHP functions operating with DOM, I see that:
Site de la Préfecture de Police
We can see that DOM read `é` instead of `é` and replaced `é` with HTML entities é
.
After, I pass again through DOM functions and I get another strange conversion:
Site de la Préfecture de Police
Now it looks like hexadecimal encoding of é
: %C3 %A9
Do you know what's happening ?
Upvotes: 1
Views: 8987
Reputation: 610
Ok, found it !
Two PHP functions where involved in the problem :
html_entity_decode
was working in ISO-8859-1$dom->loadHTML($xml)
was working in ASCIII fixed by setting the desired charset :
html_entity_decode( $newContent, ENT_NOQUOTES, 'UTF-8' );
$dom->loadHTML('<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body>' . $xml . '</body></html>');
Upvotes: 4
Reputation: 508
Try charset: iso-8859-1 instead of UTF-8 or be sure to set the charset in your html header:
PHP : header('Content-type: text/html; charset=utf-8');
HTML: <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Upvotes: 3