Reputation: 863
Standing on the greatness of others on the web (props to them), I ran across this command:
perl -0 -p -i -e 's|<nodeName>.*?</nodeName>|$&=~/this/?"":$&|gse' file
It will find an XML node (in this case "nodeName"), look for a specific string (in this case, "this"), and delete the entire node. It's pretty sweet.
With this command, a file that looks like this:
<nodeName>
<subNode>those</subNode>
</nodeName>
<nodeName>
<subNode>this</subNode>
</nodeName>
<nodeName>
<subNode>that</subNode>
</nodeName>
<nodeName>
<subNode>these</subNode>
</nodeName>
Will come out looking like this:
<nodeName>
<subNode>those</subNode>
</nodeName>
<nodeName>
<subNode>that</subNode>
</nodeName>
<nodeName>
<subNode>these</subNode>
</nodeName>
However, my needs are for it to look for "this" or "that", and if it finds either, delete the entire node. So to that effect, I'm using this command:
perl -0 -p -i -e 's|<nodeName>.*?</nodeName>|$&=~/this/?"":$&|gse' file;perl -0 -p -i -e 's|<nodeName>.*?</nodeName>|$&=~/that/?"":$&|gse' file
This is basically "run a command twice to looking for 2 different things, but perform the same action." My question in all of this is, can the original perl command be simplified to look for "this" OR "that" in one command?
I've tried this:
perl -0 -p -i -e 's|<nodeName>.*?</nodeName>|$&=~/(this|that)/?"":$&|gse' file
But I'm kinda green on perl. I thought this would work similar to this:
s/(dog|cat)s are (invited|welcome)/$1s are not $2/;
But it doesn't. I'm not sure if what I'm hoping to accomplish is even possible. So in closing, I did get a bit rambly. To restate the question: can the original perl command be simplified to look for "this" OR "that" in one command?
Thank you in advance.
NOTE: I'm working on servers that do not have xmlstarlet installed, and I don't have authorization to install it.
Upvotes: 2
Views: 163
Reputation: 53478
Ugh, please don't do that. XML is not suitable for parsing with regular expressions. There are a variety of semantically identical things that you can do to XML which means regular expressions just don't match any more.
Please - on behalf of future sysadmins and maintenance programmers - use a parser instead.
If you want to delete 'nodeName' containing text 'this' or 'that':
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
XML::Twig->new(
'pretty_print' => 'indented_a',
'twig_handlers' => {
'nodeName' => sub { $_->delete if $_->text =~ m/this|that/ }
}
)->parse( \*DATA )->print;
__DATA__
<root>
<nodeName>
<subNode>those</subNode>
</nodeName>
<nodeName>
<subNode>this</subNode>
</nodeName>
<nodeName>
<subNode>that</subNode>
</nodeName>
<nodeName>
<subNode>these</subNode>
</nodeName>
</root>
This sets a twig handler that 'catches' nodeName
and deletes if if a condition applies.
If you want to one-liner it:
perl -MXML::Twig -e 'XML::Twig->new( 'pretty_print' => 'indented_a', 'twig_handlers' => {'nodeName' => sub { $_->delete if $_->text =~ m/this|that/ }})->parsefile( $ARGV[0] )->print;'
You can also use parsefile_inplace
to change the original source file too.
Upvotes: 2
Reputation: 385655
perl -i -0777pe's{
<nodeName>
(?: (?!</nodeName>). )*
(?: this | that )
(?: (?!</nodeName>). )*
</nodeName>
}{}xsg' file
Upvotes: 2
Reputation: 54323
Since your outside regular expression is using the pipe |
as a delimiter you are breaking the pattern when you are using the pipe as or
in your inner regex.
perl -0 -p -i -e 's{<nodeName>.*?</nodeName>}{$&=~/(?:this|that)/?"":$&}gse' file
Like that it should work. I've replaced the pipes with {}
. I've also added a non-capture group for good measure as there is no reason to keep the this|that
available.
You could of course also just escape the inner |
, but the above solution is clearer.
perl -0 -p -i -e 's|<nodeName>.*?</nodeName>|$&=~/(this\|that)/?"":$&|gse' file
Also note that it might work for your one tag per line file, but it will break if your XML is more complex.
Upvotes: 5