Franck
Franck

Reputation: 610

PHP DOM reads "é" instead of "é"

I'm handling UTF8 data from an UTF8 database and I'm having trouble with UTF8 encoding.

  1. Raw content is extracted correctly fron the DB and I do see "é" in my UTF8 terminal:

    Site de la Préfecture de Police
  2. Then, when I pass this content through PHP functions operating with DOM, I see that:

    Site de la Préfecture de Police
  3. We can see that DOM read `é` instead of `é` and replaced `é` with HTML entities é.

  4. After, I pass again through DOM functions and I get another strange conversion:

    Site de la Préfecture de Police

Now it looks like hexadecimal encoding of é : %C3 %A9

Do you know what's happening ?

Upvotes: 1

Views: 8987

Answers (2)

Franck
Franck

Reputation: 610

Ok, found it !

Two PHP functions where involved in the problem :

  • html_entity_decode was working in ISO-8859-1
  • $dom->loadHTML($xml) was working in ASCII

I fixed by setting the desired charset :

  • html_entity_decode( $newContent, ENT_NOQUOTES, 'UTF-8' );
  • $dom->loadHTML('<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body>' . $xml . '</body></html>');

Upvotes: 4

LAL
LAL

Reputation: 508

Try charset: iso-8859-1 instead of UTF-8 or be sure to set the charset in your html header:

PHP : header('Content-type: text/html; charset=utf-8');
HTML: <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Upvotes: 3

Related Questions