RussNS
RussNS

Reputation: 863

Remove XML Node Using Perl - This or That

Standing on the greatness of others on the web (props to them), I ran across this command:

perl -0 -p -i -e 's|<nodeName>.*?</nodeName>|$&=~/this/?"":$&|gse' file

It will find an XML node (in this case "nodeName"), look for a specific string (in this case, "this"), and delete the entire node. It's pretty sweet.

With this command, a file that looks like this:

<nodeName>
    <subNode>those</subNode>
</nodeName>
<nodeName>
    <subNode>this</subNode>
</nodeName>
<nodeName>
    <subNode>that</subNode>
</nodeName>
<nodeName>
    <subNode>these</subNode>
</nodeName>

Will come out looking like this:

<nodeName>
    <subNode>those</subNode>
</nodeName>
<nodeName>
    <subNode>that</subNode>
</nodeName>
<nodeName>
    <subNode>these</subNode>
</nodeName>

However, my needs are for it to look for "this" or "that", and if it finds either, delete the entire node. So to that effect, I'm using this command:

perl -0 -p -i -e 's|<nodeName>.*?</nodeName>|$&=~/this/?"":$&|gse' file;perl -0 -p -i -e 's|<nodeName>.*?</nodeName>|$&=~/that/?"":$&|gse' file

This is basically "run a command twice to looking for 2 different things, but perform the same action." My question in all of this is, can the original perl command be simplified to look for "this" OR "that" in one command?

I've tried this:

perl -0 -p -i -e 's|<nodeName>.*?</nodeName>|$&=~/(this|that)/?"":$&|gse' file

But I'm kinda green on perl. I thought this would work similar to this:

s/(dog|cat)s are (invited|welcome)/$1s are not $2/;

But it doesn't. I'm not sure if what I'm hoping to accomplish is even possible. So in closing, I did get a bit rambly. To restate the question: can the original perl command be simplified to look for "this" OR "that" in one command?

Thank you in advance.

NOTE: I'm working on servers that do not have xmlstarlet installed, and I don't have authorization to install it.

Upvotes: 2

Views: 163

Answers (3)

Sobrique
Sobrique

Reputation: 53478

Ugh, please don't do that. XML is not suitable for parsing with regular expressions. There are a variety of semantically identical things that you can do to XML which means regular expressions just don't match any more.

Please - on behalf of future sysadmins and maintenance programmers - use a parser instead.

If you want to delete 'nodeName' containing text 'this' or 'that':

#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;

XML::Twig->new(
    'pretty_print'  => 'indented_a',
    'twig_handlers' => {
        'nodeName' => sub { $_->delete if $_->text =~ m/this|that/ }
    }
)->parse( \*DATA )->print;

__DATA__
<root>
<nodeName>
    <subNode>those</subNode>
</nodeName>
<nodeName>
    <subNode>this</subNode>
</nodeName>
<nodeName>
    <subNode>that</subNode>
</nodeName>
<nodeName>
    <subNode>these</subNode>
</nodeName>
</root>

This sets a twig handler that 'catches' nodeName and deletes if if a condition applies.

If you want to one-liner it:

perl -MXML::Twig -e 'XML::Twig->new( 'pretty_print'  => 'indented_a', 'twig_handlers' => {'nodeName' => sub { $_->delete if $_->text =~ m/this|that/ }})->parsefile( $ARGV[0] )->print;'

You can also use parsefile_inplace to change the original source file too.

Upvotes: 2

ikegami
ikegami

Reputation: 385655

perl -i -0777pe's{
   <nodeName>
   (?: (?!</nodeName>). )*
   (?: this | that )
   (?: (?!</nodeName>). )*
   </nodeName>
}{}xsg' file

Upvotes: 2

simbabque
simbabque

Reputation: 54323

Since your outside regular expression is using the pipe | as a delimiter you are breaking the pattern when you are using the pipe as or in your inner regex.

perl -0 -p -i -e 's{<nodeName>.*?</nodeName>}{$&=~/(?:this|that)/?"":$&}gse' file

Like that it should work. I've replaced the pipes with {}. I've also added a non-capture group for good measure as there is no reason to keep the this|that available.

You could of course also just escape the inner |, but the above solution is clearer.

perl -0 -p -i -e 's|<nodeName>.*?</nodeName>|$&=~/(this\|that)/?"":$&|gse' file

Also note that it might work for your one tag per line file, but it will break if your XML is more complex.

Upvotes: 5

Related Questions