Reputation: 2728
Using the following characters: " & ' < > £
for testing. My code builds an XML file using PHP and DOMDocument.
<?php
$xml = new DOMDocument();
$xml->formatOutput = true;
$root = $xml->createElement('Start_Of_XML');
$xml->appendChild($root);
$el = $xml->createElement($node,htmlspecialchars(html_entity_decode($value[$i],ENT_QUOTES,'UTF-8'),ENT_QUOTES,'UTF-8'));
$parent->appendChild($el);
?>
The htmlspecialchars()
method above converts these chars to:
" & ' < > £
resp. That is, the double quote, apostrophe and pound sign fail to get encoded.
If I adjust the code to use htmlentities() instead:
<?
$el = $xml->createElement($node,htmlentities(html_entity_decode($value[$i],ENT_QUOTES,'UTF-8'),ENT_QUOTES,'UTF-8'));
?>
The chars get parsed as :
" & ' < > £
So the pound sign gets converted along with the rest, but again the quote and apostrophe fail to get encoded when the XML is saved.
After searching through several posts I'm at a loss to find a solution?
Edit:
Using Gordon's answer as a basis I got the results I was looking for using something along the lines of https://3v4l.org/ZksrE
Great effort from ThW though. Seems pretty comprehensive. I'm going to accept this as a solution. Thanks.
Upvotes: 1
Views: 1788
Reputation: 19512
The second argument of DOMDocument::createElement()
is broken - it only escapes partly and it is not part of the W3C DOM standard. In DOM the text content is a node. You can just create it and append it to the element node. This works with other node types like CDATA sections or comments as well. DOMNode::appendChild()
returns the appended node, so you can nest and chain the calls.
Additionally you can set the DOMElement::$textContent
property. This will replace all descendant nodes with a single text node. Do not use DOMElement::$nodeValue
- it has the same problems as the argument.
$document = new DOMDocument();
$document->formatOutput = true;
$root = $document->appendChild($document->createElement('foo'));
$root
->appendChild($document->createElement('one'))
->appendChild($document->createTextNode('"foo" & <bar>'));
$root
->appendChild($document->createElement('one'))
->textContent = '"foo" & <bar>';
$root
->appendChild($document->createElement('two'))
->appendChild($document->createCDATASection('"foo" & <bar>'));
$root
->appendChild($document->createElement('three'))
->appendChild($document->createComment('"foo" & <bar>'));
echo $document->saveXML();
Output:
<?xml version="1.0"?>
<foo>
<one>"foo" & <bar></one>
<one>"foo" & <bar></one>
<two><![CDATA["foo" & <bar>]]></two>
<three>
<!--"foo" & <bar>-->
</three>
</foo>
This will escape special characters (like &
and <
) as needed. Quotes do need to be escaped so they won't. Other special characters depend on the encoding.
$document = new DOMDocument("1.0", "UTF-8");
$document
->appendChild($document->createElement('foo'))
->appendChild($document->createTextNode('äöü'));
echo $document->saveXML();
$document = new DOMDocument("1.0", "ASCII");
$document
->appendChild($document->createElement('foo'))
->appendChild($document->createTextNode('äöü'));
echo $document->saveXML();
Output:
<?xml version="1.0" encoding="UTF-8"?>
<foo>äöü</foo>
<?xml version="1.0" encoding="ASCII"?>
<foo>äöü</foo>
Upvotes: 2