Reputation: 1354
I have trouble to load XML document into DOM preserving empty tags and null-size strings. Here the example:
$doc = new DOMDocument("1.0", "utf-8");
$root = $doc->createElement("root");
$doc->appendChild($root);
$element = $doc->createElement("element");
$root->appendChild($element);
echo $doc->saveXML();
produces following XML:
<?xml version="1.0" encoding="utf-8"?>
<root><element/></root>
Empty element, exactly as expected. Now let's add empty text node into element.
$doc = new DOMDocument("1.0", "utf-8");
$root = $doc->createElement("root");
$doc->appendChild($root);
$element = $doc->createElement("element");
$element->appendChild($doc->createTextNode(""));
$root->appendChild($element);
echo $doc->saveXML();
produces following XML:
<?xml version="1.0" encoding="utf-8"?>
<root><element></element></root>
Non-empty element with null-size string. Good! But when I am trying to do:
$doc = new DOMDocument();
$doc->loadXML($xml);
echo $doc->saveXML($doc);
on these XML documents I always get
<?xml version="1.0" encoding="utf-8"?>
<root><element/></root>
ie null-size string is removed and just empty element is loaded. I believe it happens on loadXML(). Is there any way to convince DOMDocument loadXML() not to convert null-size string into empty element? It would be preferable if DOM would have TextNode with null-size string as element's child.
Solution is needed to be in PHP DOM due to the way what would happen to the loaded data further.
Upvotes: 2
Views: 3645
Reputation: 198214
The problem to distinguish between those two is, that when DOMDocument loads the XML serialized document, it does only follow the specs.
By the book, in <element></element>
there is no empty text-node in that element - which is what others have commented already as well.
However DOMDocument is perfectly fine if you insert an empty text-node there your own. Then you can easily distinguish between a self-closing tag (no children) and an empty element (having one child, an empty text-node).
So how to enter those empty text-nodes? For example by using from the XMLReader based XMLReaderIterator library, specifically the DOMReadingIteration, which is able to build up the document, while offering each current XMLReader node for interaction:
$doc = new DOMDocument();
$iterator = new DOMReadingIteration($doc, $reader);
foreach ($iterator as $index => $value) {
// Preserve empty elements as non-self-closing by making them non-empty with a single text-node
// children that has zero-length text
if ($iterator->isEndElementOfEmptyElement()) {
$iterator->getLastNode()->appendChild(new DOMText(''));
}
}
echo $doc->saveXML();
That gives for your input:
<?xml version="1.0" encoding="utf-8"?>
<root><element></element></root>
This output:
<?xml version="1.0"?>
<root><element></element></root>
No strings attached. A fine build DOMDocument. The example is from examples/read-into-dom.php
and a fine proof that it is no problem when you load the document via XMLReader and you deal with that single special case you have.
Upvotes: 3
Reputation: 11915
You can trick XSLT processors to not use self-closing elements, by pretending a xsl:value-of
inserting a variable, but that variable being an empty string ''
.
Input:
<?xml version="1.0" encoding="utf-8"?>
<root>
<foo>
<bar some="value"></bar>
<self-closing attr="foobar" val="3.5"/>
</foo>
<goo>
<gle>
<nope/>
</gle>
</goo>
</root>
Stylesheet:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(node())]">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:attribute name="{name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:for-each>
<xsl:value-of select="''"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output:
<?xml version="1.0" encoding="utf-8"?>
<root>
<foo>
<bar some="value"></bar>
<self-closing attr="foobar" val="3.5"></self-closing>
</foo>
<goo>
<gle>
<nope></nope>
</gle>
</goo>
</root>
To solve this in PHP without the use of a XSLT processor, I can only think of adding empty text nodes to all elements with no children (like you do in the creation of the XML).
Upvotes: 0
Reputation: 19512
Here is no difference for the loading XML parser. The DOM is exactly the same.
If you load/save a XML format that has a problem with empty tags, you can use an option to avoid the empty tags on save:
$dom = new DOMDocument();
$dom->appendChild($dom->createElement('foo'));
echo $dom->saveXml();
echo "\n";
echo $dom->saveXml(NULL, LIBXML_NOEMPTYTAG);
Output:
<?xml version="1.0"?>
<foo/>
<?xml version="1.0"?>
<foo></foo>
Upvotes: 2