Line breaks produce Text nodes in DOMDocument - how should I handle this

Question

I'm using php's DOMDocument library and read an XML string with loadXML. I am then iterating over the childs of a node tagged "Info" with this code:

$doc = new \DOMDocument();
$doc->loadXML(
'

 
  3.2
  2013-10
  2014-10-10
  12:28:28
  GAEB Zertifizierung
  BVBS
 
'
);

$Info = $doc->getElementsByTagName("Info");

foreach ($Info as $element) {
    echo "[". $element->nodeName. "]";
    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
        echo "[" . $node->nodeName . "]";
        echo  $node->nodeValue;
    }
}

This node has 6 childs, however the iteration has 13 runs. That's because there is whitespace characters that interpreted as Text nodes. If I look at each nodes $node->nodeType it shows 1 for the 6 real childs and 3 for the 7 childs which contents are . The question is now, how am I supposed to deal with it? Is it ok that the DOMDocument contains those text nodes and I should "continue" over them with something like if($node->nodeType===3) continue or would I try to delete those whitespaces earlier when loading the xml. Just removing the from the input xml doesn't work because then spaces between the nodes (e.g. > <) are interpreted as Text nodes.

sample file



 
  3.2
  2013-10
  2014-10-10
  12:28:28
  GAEB Zertifizierung
  BVBS

Ruslan Osmanov · Accepted Answer

Blank nodes can be ignored with the LIBXML_NOBLANKS option as follows:

$doc->loadXML($xml, LIBXML_NOBLANKS);

Line breaks produce Text nodes in DOMDocument - how should I handle this

sample file

Answers (1)

Related Questions