Reputation: 4414
I found this code to remove empty nodes from and XML file but it isn't working correctly. It leaves an empty node that really needs to be removed. Yes, it is empty, just white space in it.
$domxml = new DOMDocument('1.0');
$domxml->preserveWhiteSpace = false;
$domxml->formatOutput = true;
$domxml->loadXML($this->response);
$this->response = $domxml->saveXML($domxml->documentElement);
Anyone know of a better way to do this?
Upvotes: 0
Views: 268
Reputation: 19502
In other words you would like to remove any element node that has no text content, no attribute, no children with text content or attributes and have a parent element node (are not the document element).
Here is an Xpath function normalize-space()
that converts any whitespace sequences to single spaces and strips them from the start/end. Any whitespace only content will result in an empty string.
//*
fetches any element node in the document in a list. You just need to add conditions.
normalize-space(.) = ""
not(@*)
not(.//node()[normalize-space(.) != ""])
not(.//*[@*])
parent::*
Put together:
$xml = <<<'XML'
<foo>
<bar></bar>
<bar>123</bar>
<bar foo="123"></bar>
<bar><foo> </foo></bar>
<bar><!-- test --></bar>
</foo>
XML;
$document = new DOMDocument();
$document->preserveWhiteSpace = FALSE;
$document->formatOutput = TRUE;
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$expression =
'//*[
normalize-space(.) = "" and
not(@*) and
not(.//node()[normalize-space(.) != ""]) and
not(.//*[@*]) and
parent::*
]';
$nodes = $xpath->evaluate($expression);
for ($i = $nodes->length - 1; $i >= 0; $i--) {
$nodes[$i]->parentNode->removeChild($nodes[$i]);
}
echo $document->saveXml();
Output:
<?xml version="1.0"?>
<foo>
<bar>123</bar>
<bar foo="123"/>
<bar>
<!-- test -->
</bar>
</foo>
Upvotes: 1
Reputation: 107652
For a generalized solution such as ALL nodes that are empty, consider XSLT. Specifically, use an empty template (translated as copy or style nothing) matched to all nodes in document with *
and conditions for text values equal to empty [.='']
.
See XSLT Fiddle Demo using top PHP and XSLT StackOverflow users where each topusers node has at least one empty child, removed entirely in the result.
XSLT (save as .xsl)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<!-- Identity Transform to Copy Document as is -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Empty Template to Remove Empty Nodes -->
<xsl:template match="*[.='']"/>
</xsl:transform>
PHP (if needed enable php_xsl extension in .ini file)
// LOAD XML
$xml = new DOMDocument('1.0', 'UTF-8');
$xml->load('Input.xml');
// LOAD XSLT
$xsl = new DOMDocument('1.0', 'UTF-8');
$xsl->load('XSLT_Script.xsl');
// INITIALIZE TRANSFORMER
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);
// RUN TRANSFORMATION
$newXML = $proc->transformToXML($xml);
// SAVE NEW TREE TO FILE
echo $newXML;
file_put_contents('Output.xml', $newXML);
Upvotes: 0