Reputation: 7509
I need to replace every character a
between xx
and zz
with hello
:
#input
a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz
#output
a xxhellob hellobzz ca xxbczz aaa axxhellozza xxczzaxxczz
This works for one pair, it doesn't work for more xx/zz
pairs (it replaces every a
between the first xx
and last zz
):
sed -r ':rep; s/(xx.*)a(.*zz)/\1hello\2/; trep'
I assume the best approach is to use more advanced regex, such as perl
.
I am looking for a solution in bash
, sed
, awk
or perl
. Is this task even possible with basic/extended regex? Solutions that will not become hard to digest when the pairs have more characters (for example xxxxxx/zzzzzz
) are preferred.
Upvotes: 5
Views: 1287
Reputation: 66899
Here is a different approach, if nothing else for comparison with a one-line regex that uses a somewhat more advanced regex feature (/e
modifier, as in mkHun answer).
Split the string by xx
. Iterate over terms and replace a
in each term's part up to zz
, if there is zz
in that term. Reassemble (join back) the string.
I replace a
to -
for easy reviewing. The begin and end patterns are in $pb
and $pe
perl -wE'
$_ = shift // q(a xxab abzz ca xxbczz aaa axxazza);
say;
$pb = qr/xx/; $pe = qr/zz/;
($s, @t) = split /($pb)/;
for (@t) {
next if /^$pb$/;
next if not @m = /(.+?) ($pe.*)/x;
$_ = $m[0] =~ s/a/-/gr . $m[1]
};
say $s . join "", @t
'
Comments
Can test with other strings by submitting on the command line, perl -wE'...' string
I use qr to assign patterns that begin and end the section of interest in case they are more complex and need a regex.
The capturing parens in split's separator pattern (/($pb)/
) make it return the separator, as well, in its place along with other parts
In each term we need both zz
(or there is no xx ... zz
so no replacement), and something before zz
(or there is no need to do anything)
The zz
can be followed by more text, up to the next xx
(on which we split)
Elements of the array with terms are changed in place (by assigning to $_
)
This is in a form that is ready to run as a command-line program, but it should be a script. It prints (comments added)
a xxab abzz ca xxbczz aaa axxazza # original string a xx-b -bzz ca xxbczz aaa axx-zza # with replacements
I've tested with a few more strings but by all means please test more.
Upvotes: 1
Reputation: 5927
You can try this Perl method
perl -E '$_="a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz";
s{xx(.+?)zz}{"xx".$1=~s/a/hello/gr."zz"}xge;
say $_ ; '
Explanation
s{
xx(.+?)zz #grouping the content
}
{
"xx".$1=~s/a/hello/gr."zz" #again making the substitution for $1 and concatenating `xx` and `zz`
}xge;
Flags
g
-> global
r
-> non destructive modifier
e
-> eval.
with look arounds
perl -E '$_="a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz";
s{(?<=xx)(.+?)(?=zz)}{$1=~s/a/hello/gr}xge;
say $_ ; '
Upvotes: 3
Reputation: 26121
Yes, it's best to use Perl
perl -pe's/xx(.+?)zz/"xx".$1=~s|a|hello|gr."zz"/ge' file.txt
Upvotes: 3
Reputation: 89584
You have to describe all that isn't zz
(a character that isn't a z
or a z
followed by an other character) before and after the a
until the zz
and to use a label and a conditional test to process the line until there is no more a
between xx
and zz
:
sed -E ':a;s/(xx([^z]|z[^z])*z?)a(([^z]|z[^z])*zz)/\1hello\3/g;ta' file
A Perl way:
perl -pe's/(?:\G(?!^)|xx(?=.*zz))[^za]*(?:z(?!z)[^za]*)*\Ka/hello/g' file
that can be easily changed to:
perl -pe's/(?:\G(?!^)|xxxxxx(?=.*zzzzzz))[^za]*(?:z(?!zzzzz)[^za]*)*\Ka/hello/g' file
to deal with xxxxxx
and zzzzzz
Upvotes: 0
Reputation: 58483
This might work for you (GNU sed):
sed -r ':a;s/zz/\n/;:b;tb;s/(xx[^\na]*)a([^\n]*\n)/\1hello\2/;tb;/zz/ba;s/\n/zz/g' file
This replaces zz
with newline and then replaces any a
's between xx
and a newline with hello
.
N.B. It is possible to have any number of xx
that are not paired with zz
and any a
's between them will be substituted.
Upvotes: 2
Reputation: 76
Your problem is with the .*
as .
will match every character including white space.
You should use \S
instead as it will match all non-white space characters:
$ echo 'a xxababzz ca xxbczz aaa axxazza' | sed -r ':rep; s/(xx\S*?)a(\S*?zz)/\1hello\2/; trep'
a xxhellobhellobzz ca xxbczz aaa axxhellozza
Upvotes: 0