Reputation: 23
I'm trying to convert a variable string depending on the context with sed.
string="GAGGTGGGTGGGGAGC"
echo $string | sed -r 's/G+([AT])/A+\1/g'
The result is: A+AA+TA+TA+AGC
. But I expect: AAAATAAATAAAAAGC
In other words, I would like to substitute an unknown stretch of Gs into As only if they are followed by a A or T. How can we recover the number of Gs of the matched patterns to reuse it in the substitution pattern?
Upvotes: 2
Views: 502
Reputation: 15461
With sed, using backreference and the t
(test) command to loop at the beginning of the command for further replacement if substitution succeed:
$ sed ':a;s/G\([AT]\)\(.*\)/A\1\2/;ta;' <<< "GAGGTGGGTGGGGAGC"
AAAATAAATAAAAAGC
How it works:
:a
: a
label for upcoming loops
: substitute commandG\([AT]\)
: search for G
followed by A
or T
. Second letter is captured and will be used in replacement string using backreference\(.*\)
: captures remaining characters A\1\2
: replace with A
followed by previously captured strings(A
or T
and remaining characters)ta
: if previous substitution succeed, go to label :a
(beginning) of the script to check for further replacementsUpvotes: 3