Overseer10
Overseer10

Reputation: 361

How to remove only the first group of a pattern?

I have tried a number of ways to approach this but I'm out of ideas. Hopefully someone out there can point out what I am doing wrong.

Here is my input:

<Root>
    <A>Keep</A>
    <B>Keep</B>
    <B>Remove</B>
    <B>Keep</B>
    <C>Keep</C>
</Root>

As you can kinda figure out by now, I'm just trying to remove line #4:

<Root>
    <A>Keep</A>
    <B>Keep</B>
    <B>Keep</B>
    <C>Keep</C>
</Root>

Here is what I have so far, but it's not quite working as intended:

sed -e '3,${g;s/<B>.*<\/B>//p}' t1

I tried adding part of the group logic that I found around but it's not working as it seems that sed has no direct way of making it greedy.

Any ideas?

Upvotes: 1

Views: 38

Answers (1)

RomanPerekhrest
RomanPerekhrest

Reputation: 92894

Hopefully someone out there can point out what I am doing wrong

The right way is to use XML/HTML parsers like xmlstarlet or xmllint:

xmlstarlet ed -O -d "//Root/*[3]" input.xml
  • ed - edit mode
  • -O - omit XML declaration (<?xml ...?>)
  • -d - delete action
  • "//Root/*[3]" - xpath expression selecting the 3rd child node of the parent node Root

The output:

<Root>
  <A>Keep</A>
  <B>Keep</B>
  <B>Keep</B>
  <C>Keep</C>
</Root>

Upvotes: 3

Related Questions