Vladimir Bashkirtsev
Vladimir Bashkirtsev

Reputation: 1354

How to distinguish between empty element and null-size string in DOMDocument?

I have trouble to load XML document into DOM preserving empty tags and null-size strings. Here the example:

$doc = new DOMDocument("1.0", "utf-8");

$root = $doc->createElement("root");
$doc->appendChild($root);

$element = $doc->createElement("element");
$root->appendChild($element);

echo $doc->saveXML();

produces following XML:

<?xml version="1.0" encoding="utf-8"?>
<root><element/></root>

Empty element, exactly as expected. Now let's add empty text node into element.

$doc = new DOMDocument("1.0", "utf-8");

$root = $doc->createElement("root");
$doc->appendChild($root);

$element = $doc->createElement("element");
$element->appendChild($doc->createTextNode(""));
$root->appendChild($element);

echo $doc->saveXML();

produces following XML:

<?xml version="1.0" encoding="utf-8"?>
<root><element></element></root>

Non-empty element with null-size string. Good! But when I am trying to do:

$doc = new DOMDocument();
$doc->loadXML($xml);

echo $doc->saveXML($doc);

on these XML documents I always get

<?xml version="1.0" encoding="utf-8"?>
<root><element/></root>

ie null-size string is removed and just empty element is loaded. I believe it happens on loadXML(). Is there any way to convince DOMDocument loadXML() not to convert null-size string into empty element? It would be preferable if DOM would have TextNode with null-size string as element's child.

Solution is needed to be in PHP DOM due to the way what would happen to the loaded data further.

Upvotes: 2

Views: 3645

Answers (3)

hakre
hakre

Reputation: 198214

The problem to distinguish between those two is, that when DOMDocument loads the XML serialized document, it does only follow the specs.

By the book, in <element></element> there is no empty text-node in that element - which is what others have commented already as well.

However DOMDocument is perfectly fine if you insert an empty text-node there your own. Then you can easily distinguish between a self-closing tag (no children) and an empty element (having one child, an empty text-node).

So how to enter those empty text-nodes? For example by using from the XMLReader based XMLReaderIterator library, specifically the DOMReadingIteration, which is able to build up the document, while offering each current XMLReader node for interaction:

$doc = new DOMDocument();

$iterator = new DOMReadingIteration($doc, $reader);

foreach ($iterator as $index => $value) {
    // Preserve empty elements as non-self-closing by making them non-empty with a single text-node
    // children that has zero-length text
    if ($iterator->isEndElementOfEmptyElement()) {
        $iterator->getLastNode()->appendChild(new DOMText(''));
    }
}

echo $doc->saveXML();

That gives for your input:

<?xml version="1.0" encoding="utf-8"?>
<root><element></element></root>

This output:

<?xml version="1.0"?>
<root><element></element></root>

No strings attached. A fine build DOMDocument. The example is from examples/read-into-dom.php and a fine proof that it is no problem when you load the document via XMLReader and you deal with that single special case you have.

Upvotes: 3

CodeManX
CodeManX

Reputation: 11915

You can trick XSLT processors to not use self-closing elements, by pretending a xsl:value-of inserting a variable, but that variable being an empty string ''.

Input:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <foo>
    <bar some="value"></bar>
    <self-closing attr="foobar" val="3.5"/>
  </foo>
  <goo>
    <gle>
      <nope/>
    </gle>
  </goo>
</root>

Stylesheet:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

  <xsl:template match="*[not(node())]">
    <xsl:copy>
      <xsl:for-each select="@*">
        <xsl:attribute name="{name()}">
          <xsl:value-of select="."/>
        </xsl:attribute>
      </xsl:for-each>
      <xsl:value-of select="''"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Output:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <foo>
    <bar some="value"></bar>
    <self-closing attr="foobar" val="3.5"></self-closing>
  </foo>
  <goo>
    <gle>
      <nope></nope>
    </gle>
  </goo>
</root>

To solve this in PHP without the use of a XSLT processor, I can only think of adding empty text nodes to all elements with no children (like you do in the creation of the XML).

Upvotes: 0

ThW
ThW

Reputation: 19512

Here is no difference for the loading XML parser. The DOM is exactly the same.

If you load/save a XML format that has a problem with empty tags, you can use an option to avoid the empty tags on save:

$dom = new DOMDocument();
$dom->appendChild($dom->createElement('foo'));

echo $dom->saveXml();
echo "\n";
echo $dom->saveXml(NULL, LIBXML_NOEMPTYTAG);

Output:

<?xml version="1.0"?>
<foo/>

<?xml version="1.0"?>
<foo></foo>

Upvotes: 2

Related Questions