PesaThe
PesaThe

Reputation: 7509

Replace multiple occurrences between two strings

I need to replace every character a between xx and zz with hello:

#input
a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz
#output
a xxhellob hellobzz ca xxbczz aaa axxhellozza xxczzaxxczz

This works for one pair, it doesn't work for more xx/zz pairs (it replaces every a between the first xx and last zz):

sed -r ':rep; s/(xx.*)a(.*zz)/\1hello\2/; trep'

I assume the best approach is to use more advanced regex, such as perl.

I am looking for a solution in bash, sed, awk or perl. Is this task even possible with basic/extended regex? Solutions that will not become hard to digest when the pairs have more characters (for example xxxxxx/zzzzzz) are preferred.

Upvotes: 5

Views: 1287

Answers (6)

zdim
zdim

Reputation: 66899

Here is a different approach, if nothing else for comparison with a one-line regex that uses a somewhat more advanced regex feature (/e modifier, as in mkHun answer).

Split the string by xx. Iterate over terms and replace a in each term's part up to zz, if there is zz in that term. Reassemble (join back) the string.

I replace a to - for easy reviewing. The begin and end patterns are in $pb and $pe

perl -wE'
    $_ = shift // q(a xxab abzz ca xxbczz aaa axxazza); 
    say; 
    $pb = qr/xx/; $pe = qr/zz/;
    
    ($s, @t) = split /($pb)/; 

    for (@t) { 
        next if /^$pb$/; 
        next if not @m = /(.+?) ($pe.*)/x; 

        $_ = $m[0] =~ s/a/-/gr . $m[1] 
    }; 
    say $s . join "", @t
'

Comments

  • Can test with other strings by submitting on the command line, perl -wE'...' string

  • I use qr to assign patterns that begin and end the section of interest in case they are more complex and need a regex.

  • The capturing parens in split's separator pattern (/($pb)/) make it return the separator, as well, in its place along with other parts

  • In each term we need both zz (or there is no xx ... zz so no replacement), and something before zz (or there is no need to do anything)

  • The zz can be followed by more text, up to the next xx (on which we split)

  • Elements of the array with terms are changed in place (by assigning to $_)

This is in a form that is ready to run as a command-line program, but it should be a script. It prints (comments added)

a xxab abzz ca xxbczz aaa axxazza    # original string
a xx-b -bzz ca xxbczz aaa axx-zza    # with replacements

I've tested with a few more strings but by all means please test more.

Upvotes: 1

mkHun
mkHun

Reputation: 5927

You can try this Perl method

perl -E '$_="a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz";
s{xx(.+?)zz}{"xx".$1=~s/a/hello/gr."zz"}xge; 
say $_ ; '

Explanation

s{
   xx(.+?)zz #grouping the content
 }
 {
   "xx".$1=~s/a/hello/gr."zz" #again making the substitution for $1 and concatenating `xx` and `zz`  
 }xge;

Flags

g -> global

r -> non destructive modifier

e -> eval.

with look arounds

perl -E '$_="a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz";
s{(?<=xx)(.+?)(?=zz)}{$1=~s/a/hello/gr}xge; 
say $_ ; '

Upvotes: 3

Hynek -Pichi- Vychodil
Hynek -Pichi- Vychodil

Reputation: 26121

Yes, it's best to use Perl

perl -pe's/xx(.+?)zz/"xx".$1=~s|a|hello|gr."zz"/ge' file.txt

Upvotes: 3

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89584

You have to describe all that isn't zz (a character that isn't a z or a z followed by an other character) before and after the a until the zz and to use a label and a conditional test to process the line until there is no more a between xx and zz :

sed -E ':a;s/(xx([^z]|z[^z])*z?)a(([^z]|z[^z])*zz)/\1hello\3/g;ta' file

A Perl way:

perl -pe's/(?:\G(?!^)|xx(?=.*zz))[^za]*(?:z(?!z)[^za]*)*\Ka/hello/g' file

that can be easily changed to:

perl -pe's/(?:\G(?!^)|xxxxxx(?=.*zzzzzz))[^za]*(?:z(?!zzzzz)[^za]*)*\Ka/hello/g' file

to deal with xxxxxx and zzzzzz

Upvotes: 0

potong
potong

Reputation: 58483

This might work for you (GNU sed):

sed -r ':a;s/zz/\n/;:b;tb;s/(xx[^\na]*)a([^\n]*\n)/\1hello\2/;tb;/zz/ba;s/\n/zz/g' file

This replaces zz with newline and then replaces any a's between xx and a newline with hello.

N.B. It is possible to have any number of xx that are not paired with zz and any a's between them will be substituted.

Upvotes: 2

DjLegolas
DjLegolas

Reputation: 76

Your problem is with the .* as . will match every character including white space. You should use \S instead as it will match all non-white space characters:

$ echo 'a xxababzz ca xxbczz aaa axxazza' | sed -r ':rep; s/(xx\S*?)a(\S*?zz)/\1hello\2/; trep'
a xxhellobhellobzz ca xxbczz aaa axxhellozza

Upvotes: 0

Related Questions