Reputation: 443
Let's assume the following XML file:
some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</addresses>
some other text
<addresses>
<something else/>
</addresses>
...
I need to replace the first </addresses>
following the first <addresses xmlns="namespace">
by </namespace:addresses>
so that the file becomes:
some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</namespace:addresses>
some other text
<addresses>
<something else/>
</addresses>
...
I am aware of this similar thread, but none of the following solution changes anything:
sed -e '/<addresses xmlns="namespace">/!b' -e ':a' -e "s/<\/namespace:addresses>/<\/addresses>/;t trail" -e 'n;ba' -e ':trail' -e 'n;btrail' file.xml
sed -e "/<addresses xmlns=\"namespace\">/,/./ s/<\/namespace:addresses>/<\/addresses>/" file.xml
sed -e "/<addresses xmlns=\"namespace\">/,/<\/namespace:addresses>/ s/<\/namespace:addresses>/<\/addresses>/" file.xml
For instance:
sed -e "/<addresses xmlns=\"namespace\">/,/./ s/<\/namespace:addresses>/<\/addresses>/" file.xml
some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</addresses>
some other text
<addresses>
<something else/>
</addresses>
...
Maybe this issue is linked to the sed I'm using: 4.7-1ubuntu1 on impish/21.10 or even 4.8-1.
Any suggestion? I'm open to any other tool (perl/awk), the simpler, the better.
Upvotes: 3
Views: 118
Reputation: 627044
It is much easier with perl
than with sed
:
perl -0777 -i -pe 's~<(addresses)\s+xmlns="namespace">[^<]*(?:<(?!/\1>)[^<]*)*\K</\1>~</namespace:$1>~' file
See the online demo. Details:
<(addresses)\s+xmlns="namespace">[^<]*(?:<(?!/\1>)[^<]*)*\K</\1>
- the regex pattern matching
<
- a <
char(addresses)
- Group 1 ($1
): addresses
\s+
- one or more whitespacesxmlns="namespace">
- a fixed string[^<]*(?:<(?!/\1>)[^<]*)*
- a much faster alternative to (?s:.)*?
- basically, matches any text up to a </addresses>
string\K
- match reset operator that omits all text matched so far from the current match memory buffer</\1>
- (this is what is finally consumed and will be replaced): </
+ Group 1 value (so as not to repeat addresses
) + >
</namespace:$1>
- the replacement is </namespace:
+ Group 1 value + >
.It replaces the first occurrence because the -0777
slurps the file into a single multiline text and there is no g
flag.
Note the difference between \1
backreference syntax inside the pattern and $1
replacement backreference in the replacement pattern in perl
command.
See the online demo:
s=' some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</addresses>
some other text
<addresses>
<something else/>
</addresses>
...'
perl -0777 -pe 's~<(addresses)\s+xmlns="namespace">[^<]*(?:<(?!/\1>)[^<]*)*\K</\1>~</namespace:$1>~' <<< "$s"
Output:
some text
<addresses>
<something/>
</addresses>
some more text
<addresses xmlns="namespace">
<could be anything/>
</namespace:addresses>
some other text
<addresses>
<something else/>
</addresses>
...
Upvotes: 1