NSPKUWCExi2pr8wVoGNk
NSPKUWCExi2pr8wVoGNk

Reputation: 2559

Find and replace in xml file using sed

I need to find and replace the value of the specific xml element. The conditions are as follows:

My test xml looks like this:

<somenode name="node1">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</somenode>

<someothernode name="node2">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</someothernode>

<somenode name="node3">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</somenode>

I expect that first and third enabled elements would be changed. So far I have managed to write this sed command:

sed -n "1h;1!H;${;g;s|\(<somenode [^>]*>\)\(.*\)\(<enabled>\s*\)0\(\s*</enabled>\)\(.*</somenode>\)|\1\2\3 1 \4\5|g;p;}" test.xml

but it changes only the last one, and I believe it is due to greedy match. Any help would be appreciated.

Upvotes: 2

Views: 9942

Answers (7)

paxdiablo
paxdiablo

Reputation: 881293

Forget sed for complex multi-line processing. Seriously.

If you're not willing to use a proper XML tool, at least use a standard string processing tool that has proper branching statements :-)

If you can guarantee your file is formatted in the way you have it, you can use something like:

pax> echo '<somenode name="node1">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</somenode>

<someothernode name="node2">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</someothernode>

<somenode name="node3">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</somenode>
' | awk '
    BEGIN {s = 0}
    /^<somenode / {s=1}
    /^<\/somenode>/ {s=0}
    /^    <enabled>0<\/enabled>/ {if (s==1) {$0="    <enabled>1</enabled>"}}
    {print}
'

to get:

<somenode name="node1">
    <some></some>
    <enabled>1</enabled>
    <some></some>
</somenode>

<someothernode name="node2">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</someothernode>

<somenode name="node3">
    <some></some>
    <enabled>1</enabled>
    <some></some>
</somenode>

The trouble with that sort of method is that it doesn't handle what may be perfectly valid XML files. This particular version has certain limitations such as:

  • the somenode start and end tags must be at the start of the line.
  • the enabled tag must be preceded by four spaces. You could work around these to make it a bit more flexible but, by the time you've written your script to handle any valid XML input, it'll be such a monstrosity that it would have been quicker to use an XML transformation tool.

That's why it's better to use a tool built specifically for the job. But, if you just want a quick hack and the file format is under your control, it's probably okay to use the awk (or perl or python or your other quick-and-dirty scripting tool of choice).

Upvotes: 2

yabt
yabt

Reputation: 41

Use xmlstarlet if possible:

echo '
<root>
<somenode name="node1">
   <some></some>
   <enabled>0</enabled>
   <some></some>
</somenode>

<someothernode name="node2">
   <some></some>
   <enabled>0</enabled>
   <some></some>
</someothernode>

<somenode name="node3">
   <some></some>
   <enabled>0</enabled>
   <some></some>
</somenode>
</root>
' > testfile.xml


xml val testfile.xml
xml el -v testfile.xml

xml ed --help

# version 1
xml ed -u "//somenode[1]/enabled" -v '1' \
       -u "//somenode[2]/enabled" -v '1' \
       testfile.xml  

# version 2  (-L for in-place editing; xmlstarlet v1.0.2)
xml ed -L -u "//somenode[@name='node1']/enabled" -v '1' \
          -u "//somenode[@name='node3']/enabled" -v '1' \
          testfile.xml  

Upvotes: 4

Jukka Matilainen
Jukka Matilainen

Reputation: 10188

Other people have already explained why it is generally not a good idea to process XML with regular expressions.

With all that in mind, here's the sed program to substitute text matching foo with bar between lines matching start and end (inclusively):

/start/,/end/s/foo/bar/

Upvotes: 2

ghostdog74
ghostdog74

Reputation: 342323

your requirement is quite simple as seen from your description, therefore there's no need to use XML parsers/tools, if you don't want to. you can use just the shell(or other shell tools you may prefer)

#!/bin/bash
while read -r line
do 
    case "$line" in
        *"<someothernode"* ) flag=0;;
        *"<somenode"* )flag=1;;
    esac
    if [ "$flag" -eq "1" ] ;then
        case "$line" in
            *"<enabled"* ) 
                echo "${line/<enabled>0/<enabled>1}"
                ;;
            *) echo $line;
        esac
    else
        echo $line
    fi    
done < "file"

Upvotes: -1

YOU
YOU

Reputation: 123801

You seems need to loop something with sed

http://www.rtfiber.com.tw/~changyj/sed/html/p.20070613a.html

I still can't figure out though, just for your information.

Upvotes: 0

ghostdog74
ghostdog74

Reputation: 342323

you can use gawk

awk -vRS= '/somenode/{ 
    $0=gensub("(.*<enabled>)([01])(</enabled>.*)", "\\11\\3","g",$0) 
}1'  file

output

$ ./shell.sh
<somenode name="node1">
    <some></some>
    <enabled>1</enabled>
    <some></some>
</somenode>
<someothernode name="node2">
    <some></some>
    <enabled>0</enabled>
    <some></some>
</someothernode>
<somenode name="node3">
    <some></some>
    <enabled>1</enabled>
    <some></some>
</somenode>

Upvotes: 0

peter.murray.rust
peter.murray.rust

Reputation: 38033

It is generally a poor idea to try to use regexes to parse XML. See previous discussion such as Parsing XML with REGEX in Java. (Actually your XML is not well-formed since it does not have exactly one root element). There are many different (free) XML engines for parsing and manipulating XML in almost every language and I'd recommend you use one of those.

Upvotes: 4

Related Questions