Replacing first occurrence line after first matched line

Let's assume the following XML file:

    some text
    <addresses>
      <something/>
    </addresses>
    some more text
    <addresses xmlns="namespace">
      <could be anything/>
    </addresses>
    some other text
    <addresses>
      <something else/>
    </addresses>
    ...

I need to replace the first </addresses> following the first <addresses xmlns="namespace"> by </namespace:addresses> so that the file becomes:

    some text
    <addresses>
      <something/>
    </addresses>
    some more text
    <addresses xmlns="namespace">
      <could be anything/>
    </namespace:addresses>
    some other text
    <addresses>
      <something else/>
    </addresses>
    ...

I am aware of this similar thread, but none of the following solution changes anything:

sed -e '/<addresses xmlns="namespace">/!b' -e ':a' -e "s/<\/namespace:addresses>/<\/addresses>/;t trail" -e 'n;ba' -e ':trail' -e 'n;btrail' file.xml
sed -e "/<addresses xmlns=\"namespace\">/,/./  s/<\/namespace:addresses>/<\/addresses>/" file.xml
sed -e "/<addresses xmlns=\"namespace\">/,/<\/namespace:addresses>/  s/<\/namespace:addresses>/<\/addresses>/" file.xml

For instance:

sed -e "/<addresses xmlns=\"namespace\">/,/./  s/<\/namespace:addresses>/<\/addresses>/" file.xml
    some text
    <addresses>
      <something/>
    </addresses>
    some more text
    <addresses xmlns="namespace">
      <could be anything/>
    </addresses>
    some other text
    <addresses>
      <something else/>
    </addresses>
    ...

Maybe this issue is linked to the sed I'm using: 4.7-1ubuntu1 on impish/21.10 or even 4.8-1.

Any suggestion? I'm open to any other tool (perl/awk), the simpler, the better.

Upvotes: 3

Views: 118

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627044

It is much easier with perl than with sed:

perl -0777 -i -pe 's~<(addresses)\s+xmlns="namespace">[^<]*(?:<(?!/\1>)[^<]*)*\K</\1>~</namespace:$1>~' file

See the online demo. Details:

  • <(addresses)\s+xmlns="namespace">[^<]*(?:<(?!/\1>)[^<]*)*\K</\1> - the regex pattern matching
    • < - a < char
    • (addresses) - Group 1 ($1): addresses
    • \s+ - one or more whitespaces
    • xmlns="namespace"> - a fixed string
    • [^<]*(?:<(?!/\1>)[^<]*)* - a much faster alternative to (?s:.)*? - basically, matches any text up to a </addresses> string
    • \K - match reset operator that omits all text matched so far from the current match memory buffer
    • </\1> - (this is what is finally consumed and will be replaced): </ + Group 1 value (so as not to repeat addresses) + >
  • </namespace:$1> - the replacement is </namespace: + Group 1 value + >.

It replaces the first occurrence because the -0777 slurps the file into a single multiline text and there is no g flag.

Note the difference between \1 backreference syntax inside the pattern and $1 replacement backreference in the replacement pattern in perl command.

See the online demo:

s='    some text
    <addresses>
      <something/>
    </addresses>
    some more text
    <addresses xmlns="namespace">
      <could be anything/>
    </addresses>
    some other text
    <addresses>
      <something else/>
    </addresses>
    ...'
perl -0777 -pe 's~<(addresses)\s+xmlns="namespace">[^<]*(?:<(?!/\1>)[^<]*)*\K</\1>~</namespace:$1>~' <<< "$s"

Output:

 some text
    <addresses>
      <something/>
    </addresses>
    some more text
    <addresses xmlns="namespace">
      <could be anything/>
    </namespace:addresses>
    some other text
    <addresses>
      <something else/>
    </addresses>
    ...

Upvotes: 1

Related Questions