user191688
user191688

Reputation: 2659

php getElementsByTagName with nodeValue returns evil characters

I have some utf-8 html like this:

<a href="http://example.com">Today&nbsp;11:12&nbsp;AM</a>

And getElementsByTagName('a')->item(0)->nodeValue returns this:

Today 11:12 AM

I am not having any problems with other nodes in this html.

What am I doing wrong?

Upvotes: 0

Views: 231

Answers (2)

user191688
user191688

Reputation: 2659

Source documents are ASP and IIS.

I ended up using this for the offending characters:

str_replace( chr(), chr(), $html);

Upvotes: 0

Headshota
Headshota

Reputation: 21439

try to explicitly set the encoding for the DOMDocument Object:

$dom = new DOMDocument('1.0', 'UTF-8');

Upvotes: 0

Related Questions