Marcus Harrison
Marcus Harrison

Reputation: 869

PHP's DOMDocument appears to not recognize certain HTML entities, how can I include them in my output?

It appears that DOMDocument doesn't recognize certain HTML entities:

<?php

$html = '<body>&amp; &bigstar;</body>';
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadHTML($html);
echo $doc->saveHTML($doc->documentElement);

https://3v4l.org/rLirt

In the demo code above, the ampersand & (&amp;) is encoded correctly, while the star ★ (&bigstar;) is converted to &amp;bigstar;. The DOMDocument doesn't throw any warnings or errors - it appears to both recognize &bigstar; as a valid HTML entity and yet still convert the leading ampersand into its own HTML entity.

Which HTML entities does the PHP DOM extension with loadHTML not understand? Is there a way to prevent it from turning the openings of these elements into ampersand elements?

Upvotes: 2

Views: 289

Answers (1)

Alex Kapustin
Alex Kapustin

Reputation: 2459

remove html_entity_decode

when you save you will save:

<html><body>&amp; &amp;bigstar;</body></html>

which is correct. When you get value from node

echo $doc->getElementsByTagName('body')->item(0)->nodeValue;

you will get:

& &bigstar;

only entities from that list are allowed: https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

Upvotes: 1

Related Questions