ToonZ
ToonZ

Reputation: 135

Replace/delete special characters within matched strings in sed

I have a file containing lines like

I want a lot <*tag 1> more <*tag 2>*cheese *cakes.

I am trying to remove the * within <> but not outside. The tags can be more complicated than above. For example, <*better *tag 1>.

I tried /\bregex\b/s/\*//g, which works for tag 1 but not tag 2. So how can I make it work for tag 2 as well?

Many thanks.

Upvotes: 5

Views: 479

Answers (3)

bambams
bambams

Reputation: 765

Obligatory Perl solution:

perl -pe '$_ = join "",
        map +($i++ % 2 == 0 ? $_ : s/\*//gr),
        split /(<[^>]+>)/, $_;' FILE

Append:

perl -pe 's/(<[^>]+>)/$1 =~ s(\*)()gr/ge' FILE

Upvotes: 3

bartimar
bartimar

Reputation: 3534

Simple solution if you have only one asterisk in tag

sed 's/<\([^>]*\)\*\([^>]*\)>/<\1\2>/g'

If you can have more, you can use sed goto label system

sed ':doagain s/<\([^>]*\)\*\([^>]*\)>/<\1\2>/g; t doagain'

Where doagain is label for loop, t doagain is conditional jump to label doagain. Refer to the sed manual:

t label

 Branch to label only if there has been a successful substitution since the last 
 input line was read or conditional branch was taken. The label may be omitted, in 
 which case the next cycle is started.

Upvotes: 3

Kent
Kent

Reputation: 195059

awk could solve your problem:

awk '{x=split($0,a,/<[^>]*>/,s);for(i in s)gsub(/\*/,"",s[i]);for(j=1;j<=x;j++)r=r a[j] s[j]; print r}' file

more readable version:

 awk '{x=split($0,a,/<[^>]*>/,s)
       for(i in s)gsub(/\*/,"",s[i])
       for(j=1;j<=x;j++)r=r a[j] s[j]
       print r}' file

test with your data:

kent$  cat file
I want a lot <*tag 1> more <*tag 2>*cheese *cakes. <*better *tag X*>

kent$  awk '{x=split($0,a,/<[^>]*>/,s);for(i in s)gsub(/\*/,"",s[i]);for(j=1;j<=x;j++)r=r a[j] s[j]; print r}' file
I want a lot <tag 1> more <tag 2>*cheese *cakes. <better tag X>

Upvotes: 1

Related Questions