retrogenomics
retrogenomics

Reputation: 23

variable string conversion with sed

I'm trying to convert a variable string depending on the context with sed.

string="GAGGTGGGTGGGGAGC"
echo $string | sed -r 's/G+([AT])/A+\1/g'

The result is: A+AA+TA+TA+AGC. But I expect: AAAATAAATAAAAAGC

In other words, I would like to substitute an unknown stretch of Gs into As only if they are followed by a A or T. How can we recover the number of Gs of the matched patterns to reuse it in the substitution pattern?

Upvotes: 2

Views: 502

Answers (1)

SLePort
SLePort

Reputation: 15461

With sed, using backreference and the t(test) command to loop at the beginning of the command for further replacement if substitution succeed:

$ sed ':a;s/G\([AT]\)\(.*\)/A\1\2/;ta;' <<< "GAGGTGGGTGGGGAGC"
AAAATAAATAAAAAGC

How it works:

  • :a: a label for upcoming loop
  • s: substitute command
  • G\([AT]\): search for G followed by A or T. Second letter is captured and will be used in replacement string using backreference
  • \(.*\): captures remaining characters
  • A\1\2: replace with A followed by previously captured strings(A or T and remaining characters)
  • ta: if previous substitution succeed, go to label :a(beginning) of the script to check for further replacements

Upvotes: 3

Related Questions