Reputation: 105
I'm trying to search for a string in an XML
file, increment the number by 1
that immediately follows it, and then save the changes back to that same file. There is only one instance of this string.
My file looks like this:
<attribute>
<name>test</name>
<type>java.lang.String</type>
<value>node1-3</value>
</attribute>
I'm trying to change the 3
(after node1-) and increment it by 1
every time I run a command. I've tried the following sed, separating that line into 4
pieces, and replacing it with those 4
pieces, plus an increment. Unfortunately, it doesn't seem to do anything:
sed -i -r -e 's/(.*)(\node1-)([0-9]+)(.*)/echo "\1\2$((\3+1))\4"/g' filepath
I've also tried awk
, which seems to be getting me somewhere, but I'm not sure how to append the second half of the line back in (
awk '{FS=OFS="-" }/node1/{$2+=1}1' filepath
Finally, I tried perl, but its incrementing the wrong number, from node1
to node2
, rather than after the dash:
perl -i -pe '/node1-/ && s/(\d+)(.*)/$1+1 . $2/e' filepath
I'm new to these commands, and am not so solid on my regex. I'm trying to get this command working, so that I could use this in a bash script I'm writing. What is the best approach to take? Which command has an advantage over the other? I'd like to have a 1
line command to simplify things for later.
Upvotes: 10
Views: 1576
Reputation: 66881
Process the file using an XML parser. This is just better in every way than hacking it with a regex.
use warnings;
use strict;
use XML::LibXML;
my $file = shift // die "Usage: $0 file\n";
my $doc = XML::LibXML->load_xml(location => $file);
my ($node) = $doc->findnodes('//value');
my $new_value = $node->to_literal =~ s/node1\-\K([0-9]+)/1+$1/er;
$node->removeChildNodes();
$node->appendText($new_value);
$doc->toFile('new_' . $file); # or just $file to overwrite
Change the output filename to the input name ($file
) to overwrite, once tested fully.
Removing and adding a node like above is one way to change an XML object.
Or, setData on the first child
$node->firstChild->setData($new_value);
where setData
can be used on a node of type text
, cdata
or comment
.
Or, search for text and then work with a text node directly
my ($tnode) = $doc->findnodes('//value/text()');
my $new_value = $tnode =~ s/node1\-\K([0-9]+)/1+$1/er;
$tnode->setData($new_value);
print $doc->toString;
There's more. What method to use depends on all that need be done. If the sole job is indeed to just edit that text then the simplest way is probably to get a text
node.
Upvotes: 7
Reputation: 132822
Just for fun, I used Perl's Mojo::DOM to do the same task using CSS selectors. This isn't as powerful as XML::Twig (no stream parsing!), but for simple things it can work out nicely:
#!perl
use v5.26;
use Mojo::DOM;
my $xml = <<~"XML";
<attribute>
<name>test</name>
<type>java.lang.String</type>
<value>node1-3</value>
</attribute>
XML
my $dom = Mojo::DOM->new( $xml );
my $node = $dom->at( 'attribute value' ); # CSS Selector
my $current = $node->text;
say "Current text is $current";
# how you change the value is up to you. This line is
# just how I did it.
my $next = $current =~ s/(\d+)\z/ $1 + 1 /re;
say "Next text is $next";
$node->content( $next );
say $dom;
It's not so bad as a one-liner, but it's a bit verbose for that. The -0777
enables paragraph mode to slurp in all the content on the first line read (there's file name command-line argument at the end):
$ perl -MMojo::DOM -0777 -E '$d=Mojo::DOM->new(<>); $n=$d->at(q(attribute value)); $n->content($n->text =~ s/(\d+)\z/$1+1/er); say $d' text.xml
<attribute>
<name>test</name>
<type>java.lang.String</type>
<value>node1-4</value>
</attribute>
Mojo has an ojo
module (so, with -M
, spells Mojo
) that makes this slightly simpler at the expense of declaring variables. It's x()
is a shortcut for Mojo::DOM->new()
:
$ perl -Mojo -0777 -E 'my $d=x(<>); my $n=$d->at(q(attribute value)); $n->content($n->text =~ s/(\d+)\z/$1+1/er); say $d' text.xml
<attribute>
<name>test</name>
<type>java.lang.String</type>
<value>node1-4</value>
</attribute>
Upvotes: 5
Reputation: 132822
I don't like using line-oriented text processing for modifying XML. You lose context and position and you can't tell if you are actually modifying what you think you are (inside comments, CDATA, etc).
But, ignoring that, here's your one-liner that has an easy fix. Basically, you aren't anchoring correctly. You match the first group of digits when you want the second:
$ perl -i -pe '/node1-/ && s/(\d+)(.*)/$1+1 . $2/e' filepath
Instead, match a group of digits immediately before a <
. The (?=...)
is a positive lookahead that doesn't match characters (just the condition), so you don't substitute those:
$ perl -i -pe '/node1-/ && s/(\d+)(?=<)/$1+1/e' filepath
However, I'd combine the first match. The \K
allows you to ignore part of a substitution's match. You have to match the stuff before \K
, but you won't replace that part:
$ perl -i -pe 's/node1-\K(\d+)/$1+1/e' filepath
Again, these might work, but eventually you (more likely the next guy) will be burned by it. I don't know your situation, but as I often advise people: it's not the rarity, it's the calamity.
Upvotes: 4
Reputation: 132822
Here's an example using Perl's XML::Twig. Basically, you create a handler for a node, then do whatever you need to do in that handler. You can see the current text, make a new string, and set the node text to that string. It's a bit intimidating at first, but it's very powerful once you get used to it. I prefer this to other Perl XML parsers, but for very simple things it might not be the best tool:
#!perl
use v5.26;
use XML::Twig;
my $xml = <<~"XML";
<attribute>
<name>test</name>
<type>java.lang.String</type>
<value>node1-3</value>
</attribute>
XML
my $twig = XML::Twig->new(
pretty_print => 'indented',
twig_handlers => {
# the key is the name of the node you want to process
value => sub {
# each handler gets the twig and the current node
my( $t, $node ) = @_;
my $current = $node->text;
# how you modify the text is not important. This
# is just a Perl substitution that does not modify
# the original but returns the new string
my $next = $current =~ s/(\d+)\z/ $1 + 1 /re;
$node->set_text( $next );
}
}
);
$twig->parse( $xml );
my $updated_xml = $twig->sprint;
say $updated_xml;
Some other things to read for XML::Twig:
Upvotes: 6
Reputation: 8623
Can you just hard-code the final part of the node line?
$ awk '{FS=OFS="-" }/node1/{$2+=1; print $1 "-" $2 "</value>"} $0 !~ /node1/ {print}' file
<attribute>
<name>test</name>
<type>java.lang.String</type>
<value>node1-4</value>
</attribute>
Upvotes: 3