Reputation: 869
It appears that DOMDocument doesn't recognize certain HTML entities:
<?php
$html = '<body>& ★</body>';
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadHTML($html);
echo $doc->saveHTML($doc->documentElement);
In the demo code above, the ampersand & (&
) is encoded correctly, while the star ★ (★
) is converted to &bigstar;
. The DOMDocument doesn't throw any warnings or errors - it appears to both recognize ★
as a valid HTML entity and yet still convert the leading ampersand into its own HTML entity.
Which HTML entities does the PHP DOM extension with loadHTML not understand? Is there a way to prevent it from turning the openings of these elements into ampersand elements?
Upvotes: 2
Views: 289
Reputation: 2459
remove html_entity_decode
when you save you will save:
<html><body>& &bigstar;</body></html>
which is correct. When you get value from node
echo $doc->getElementsByTagName('body')->item(0)->nodeValue;
you will get:
& ★
only entities from that list are allowed: https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
Upvotes: 1