Priyank
Priyank

Reputation: 1328

PHP Remove Empty Node Values From XML

I have generated an xml. There are few empty nodes which I want to remove

My XML

https://pastebin.com/wzjmZChU

I want to remove all empty nodes from my xml. Using xpath I tried

$xpath = '//*[not(node())]';
foreach ($xml->xpath($xpath) as $remove) {
    unset($remove[0]);
}

The above code is working to a certain level but I am not able to remove all empty node values.

Edit

I have tried the above code and it only works for single level.

Upvotes: 1

Views: 969

Answers (1)

ThW
ThW

Reputation: 19492

You consider any element node without a child empty //*[not(node())] will accomplish that. But if it removes the element nodes it can result in additional empty nodes, so you will need an expression that does not only remove the currently empty element nodes, but these with only empty descendant nodes (recursively). Additionally you might want to avoid to remove the document element even if it is empty because that would result in an invalid document.

Building up the expression

  • Select the document element
    /*
  • Any descendant of the document element
    /*//*
  • ...with only whitespaces as text content (this includes descendants)
    /*//*[normalize-space(.) = ""]
  • ...and no have attributes
    /*//*[normalize-space(.) = "" and not(@*)]
  • ...or an descendants with attributes
    /*//*[normalize-space(.) = "" and not(@* or .//*[@*])]
  • ...or a comment
    /*//*[normalize-space(.) = "" and not(@* or .//*[@*] or .//comment())]
  • ...or a pi
    /*//*[ normalize-space(.) = "" and not(@* or .//*[@*] or .//comment() or .//processing-instruction()) ]

Put together

Iterate the result in reverse order, so that child nodes are deleted before parents.

$xmlString = <<<'XML'
<foo>
  <empty/>
  <empty></empty>
  <bar><empty/></bar>
  <bar attr="value"><empty/></bar>
  <bar>text</bar>
  <bar>
   <empty/>
   text
  </bar>
  <bar>
   <!-- comment -->
  </bar>
</foo>
XML;

$xml = new SimpleXMLElement($xmlString);

$xpath = '/*//*[
  normalize-space(.) = "" and
  not(
    @* or 
    .//*[@*] or 
    .//comment() or
    .//processing-instruction()
  )
]';
foreach (array_reverse($xml->xpath($xpath)) as $remove) {
  unset($remove[0]);
}

echo $xml->asXml();

Output:

<?xml version="1.0"?>
<foo>



  <bar attr="value"/>
  <bar>text</bar>
  <bar>

   text
  </bar>
  <bar>
   <!-- comment -->
  </bar>
</foo>

Upvotes: 4

Related Questions