snackbar
snackbar

Reputation: 93

Perl code to delete a multi line XML node

I have an xml file test.xml

<many-nested-roots>

    <foo>
      <bar>
      </bar>
    </foo>
    
    <other-random-nodes></other-random-nodes>

    <foo>
      <bar>
        <foobar>
        </foobar>
      </bar>
    </foo>
    
    <!-- multiple such blocks not in any particular order -->

</many-nested-roots>

I need to delete xml node <foo><bar></bar></foo> but not <foo><bar><foobar></foobar></bar></foo>.

EDIT: The node <foo><bar></bar></foo> occurs multiple times and randomly across a heavily nested XML.

What I tried which doesn't work:

perl -ne 'print unless /^\s*<foo>\n\s*<bar>\n\s*<\bar>\n\s*<\/foo>/' test.xml

^ This doesn't match for newline

perl -ne 'print unless /<foo>/ ... /<\/foo>/' test.xml

^ This deletes all the tags including <foobar>

perl -ne 'print unless /<foo>.*?<bar>.*?<\/bar>.*?<\/foo>/s' test.xml

^ I used /s to let . match for newline. Doesn't work.

Upvotes: 1

Views: 198

Answers (3)

Shawn
Shawn

Reputation: 52344

A one-liner using XML::LibXML and an XPath expression to find the nodes to delete:

perl -MXML::LibXML -E '
  my $dom = XML::LibXML->load_xml(location => $ARGV[0]);
  $_->unbindNode for $dom->documentElement->find("//foo/bar[count(*)=0]/..")->@*;
  print $dom->serialize' test.xml

(Old versions of perl need @{$dom->...} instead of $dom->...->@*)

Or using xmlstarlet (not perl, but very handy for scripted manipulation of XML files):

 xmlstarlet ed -d '//foo/bar[count(*)=0]/..' test.xml

Upvotes: 5

Tekki
Tekki

Reputation: 351

As @Shawn and @tshiono said, you should not use regex but a XML parser. Here is an example, but not a one-liner, using Mojo::DOM provided by Mojolicious:

#!/usr/bin/env perl
use Mojo::Base -strict, -signatures;

use Mojo::DOM;
use Mojo::File 'path';

my $dom = Mojo::DOM->new->xml(1)->parse(path($ARGV[0])->slurp);
$dom->find("foo bar")->each(
  sub ($el, $i) { $el->parent->remove if $el->children->size == 0 }
);
print $dom;

If you save it as myscript.pl you can call it with ./myscript.pl test.xml.

Upvotes: 4

tshiono
tshiono

Reputation: 22012

Would you please try:

perl -0777 -pe s'#<foo>\s*<bar>\s*</bar>\s*</foo>\s*##g' test.xml

The -0777 option tells perl to slurp whole file at once to make the regex match across lines.

Please note it is not recommended to parse XML files with regex. Perl has several modules to handle XML files such as XML::Simple. As a standalone program, XMLstarlet will be a nice tool to manipulate XML files.

Upvotes: 1

Related Questions