tarekeldarwiche
tarekeldarwiche

Reputation: 105

Find and Increment a Number in an XML File

I'm trying to search for a string in an XML file, increment the number by 1 that immediately follows it, and then save the changes back to that same file. There is only one instance of this string.

My file looks like this:

        <attribute>
                <name>test</name>
                <type>java.lang.String</type>
                <value>node1-3</value>
        </attribute>

I'm trying to change the 3 (after node1-) and increment it by 1 every time I run a command. I've tried the following sed, separating that line into 4 pieces, and replacing it with those 4 pieces, plus an increment. Unfortunately, it doesn't seem to do anything:

 sed -i -r -e 's/(.*)(\node1-)([0-9]+)(.*)/echo "\1\2$((\3+1))\4"/g' filepath

I've also tried awk, which seems to be getting me somewhere, but I'm not sure how to append the second half of the line back in (

awk '{FS=OFS="-" }/node1/{$2+=1}1' filepath

Finally, I tried perl, but its incrementing the wrong number, from node1 to node2, rather than after the dash:

perl -i -pe '/node1-/ && s/(\d+)(.*)/$1+1 . $2/e' filepath

I'm new to these commands, and am not so solid on my regex. I'm trying to get this command working, so that I could use this in a bash script I'm writing. What is the best approach to take? Which command has an advantage over the other? I'd like to have a 1 line command to simplify things for later.

Upvotes: 10

Views: 1576

Answers (5)

zdim
zdim

Reputation: 66881

Process the file using an XML parser. This is just better in every way than hacking it with a regex.

use warnings;
use strict;

use XML::LibXML;

my $file = shift // die "Usage: $0 file\n";

my $doc = XML::LibXML->load_xml(location => $file);

my ($node) = $doc->findnodes('//value');

my $new_value = $node->to_literal =~ s/node1\-\K([0-9]+)/1+$1/er;

$node->removeChildNodes();
$node->appendText($new_value);

$doc->toFile('new_' . $file);   # or just $file to overwrite

Change the output filename to the input name ($file) to overwrite, once tested fully.

Removing and adding a node like above is one way to change an XML object.

Or, setData on the first child

$node->firstChild->setData($new_value);

where setData can be used on a node of type text, cdata or comment.

Or, search for text and then work with a text node directly

my ($tnode) = $doc->findnodes('//value/text()');

my $new_value = $tnode =~ s/node1\-\K([0-9]+)/1+$1/er;

$tnode->setData($new_value);

print $doc->toString;

There's more. What method to use depends on all that need be done. If the sole job is indeed to just edit that text then the simplest way is probably to get a text node.

Upvotes: 7

brian d foy
brian d foy

Reputation: 132822

Just for fun, I used Perl's Mojo::DOM to do the same task using CSS selectors. This isn't as powerful as XML::Twig (no stream parsing!), but for simple things it can work out nicely:

#!perl
use v5.26;

use Mojo::DOM;

my $xml = <<~"XML";
    <attribute>
        <name>test</name>
        <type>java.lang.String</type>
        <value>node1-3</value>
    </attribute>
    XML

my $dom = Mojo::DOM->new( $xml );
my $node = $dom->at( 'attribute value' ); # CSS Selector

my $current = $node->text;
say "Current text is $current";

# how you change the value is up to you. This line is
# just how I did it.
my $next = $current =~ s/(\d+)\z/ $1 + 1 /re;
say "Next text is $next";

$node->content( $next );

say $dom;

It's not so bad as a one-liner, but it's a bit verbose for that. The -0777 enables paragraph mode to slurp in all the content on the first line read (there's file name command-line argument at the end):

$ perl -MMojo::DOM -0777 -E '$d=Mojo::DOM->new(<>); $n=$d->at(q(attribute value)); $n->content($n->text =~ s/(\d+)\z/$1+1/er); say $d' text.xml
<attribute>
    <name>test</name>
    <type>java.lang.String</type>
    <value>node1-4</value>
</attribute>

Mojo has an ojo module (so, with -M, spells Mojo) that makes this slightly simpler at the expense of declaring variables. It's x() is a shortcut for Mojo::DOM->new():

$ perl -Mojo -0777 -E 'my $d=x(<>); my $n=$d->at(q(attribute value)); $n->content($n->text =~ s/(\d+)\z/$1+1/er); say $d' text.xml
<attribute>
    <name>test</name>
    <type>java.lang.String</type>
    <value>node1-4</value>
</attribute>

Upvotes: 5

brian d foy
brian d foy

Reputation: 132822

I don't like using line-oriented text processing for modifying XML. You lose context and position and you can't tell if you are actually modifying what you think you are (inside comments, CDATA, etc).

But, ignoring that, here's your one-liner that has an easy fix. Basically, you aren't anchoring correctly. You match the first group of digits when you want the second:

$ perl -i -pe '/node1-/ && s/(\d+)(.*)/$1+1 . $2/e' filepath

Instead, match a group of digits immediately before a <. The (?=...) is a positive lookahead that doesn't match characters (just the condition), so you don't substitute those:

$ perl -i -pe '/node1-/ && s/(\d+)(?=<)/$1+1/e' filepath

However, I'd combine the first match. The \K allows you to ignore part of a substitution's match. You have to match the stuff before \K, but you won't replace that part:

$ perl -i -pe 's/node1-\K(\d+)/$1+1/e' filepath

Again, these might work, but eventually you (more likely the next guy) will be burned by it. I don't know your situation, but as I often advise people: it's not the rarity, it's the calamity.

Upvotes: 4

brian d foy
brian d foy

Reputation: 132822

Here's an example using Perl's XML::Twig. Basically, you create a handler for a node, then do whatever you need to do in that handler. You can see the current text, make a new string, and set the node text to that string. It's a bit intimidating at first, but it's very powerful once you get used to it. I prefer this to other Perl XML parsers, but for very simple things it might not be the best tool:

#!perl
use v5.26;

use XML::Twig;

my $xml = <<~"XML";
    <attribute>
        <name>test</name>
        <type>java.lang.String</type>
        <value>node1-3</value>
    </attribute>
    XML

my $twig = XML::Twig->new(
    pretty_print  => 'indented',
    twig_handlers => {
        # the key is the name of the node you want to process
        value => sub {
            # each handler gets the twig and the current node
            my( $t, $node ) = @_;
            my $current = $node->text;
            # how you modify the text is not important. This
            # is just a Perl substitution that does not modify
            # the original but returns the new string
            my $next = $current =~ s/(\d+)\z/ $1 + 1 /re;
            $node->set_text( $next );
            }
        }
    );
$twig->parse( $xml );
my $updated_xml = $twig->sprint;

say $updated_xml;

Some other things to read for XML::Twig:

Upvotes: 6

OpenSauce
OpenSauce

Reputation: 8623

Can you just hard-code the final part of the node line?

$ awk '{FS=OFS="-" }/node1/{$2+=1; print $1 "-" $2 "</value>"} $0 !~ /node1/ {print}' file
  <attribute>
          <name>test</name>
          <type>java.lang.String</type>
          <value>node1-4</value>
  </attribute>

Upvotes: 3

Related Questions