Reputation: 2411
I'm using php's DOMDocument library and read an XML string with loadXML. I am then iterating over the childs of a node tagged "Info" with this code:
$doc = new \DOMDocument();
$doc->loadXML(
'<?xml version="1.0" encoding="UTF-8"?>
<GAEB xmlns="http://www.gaeb.de/GAEB_DA_XML/DA31/3.2">
<Info>
<Version>3.2</Version>
<VersDate>2013-10</VersDate>
<Date>2014-10-10</Date>
<Time>12:28:28</Time>
<ProgSystem>GAEB Zertifizierung</ProgSystem>
<ProgName>BVBS</ProgName>
</Info>
</GAEB>'
);
$Info = $doc->getElementsByTagName("Info");
foreach ($Info as $element) {
echo "[". $element->nodeName. "]";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo "[" . $node->nodeName . "]";
echo $node->nodeValue;
}
}
This node has 6 childs, however the iteration has 13 runs. That's because there is whitespace characters that interpreted as Text nodes. If I look at each nodes $node->nodeType
it shows 1
for the 6 real childs and 3
for the 7 childs which contents are \n
. The question is now, how am I supposed to deal with it? Is it ok that the DOMDocument contains those text nodes and I should "continue" over them with something like if($node->nodeType===3) continue
or would I try to delete those whitespaces earlier when loading the xml. Just removing the \n
from the input xml doesn't work because then spaces between the nodes (e.g. > <
) are interpreted as Text nodes.
<?xml version="1.0" encoding="UTF-8"?>
<GAEB xmlns="http://www.gaeb.de/GAEB_DA_XML/DA31/3.2">
<Info>
<Version>3.2</Version>
<VersDate>2013-10</VersDate>
<Date>2014-10-10</Date>
<Time>12:28:28</Time>
<ProgSystem>GAEB Zertifizierung</ProgSystem>
<ProgName>BVBS</ProgName>
</Info>
</GAEB>
Upvotes: 3
Views: 380
Reputation: 21492
Blank nodes can be ignored with the LIBXML_NOBLANKS
option as follows:
$doc->loadXML($xml, LIBXML_NOBLANKS);
Upvotes: 3