Reputation: 1213
I want to have the reverse complement of a DNA string in Perl. So that is simple enough and I have the following expression.
$revcomp =~ tr/ACGTacgt[]N/TGCAtgca][./;
followed by reversing the string. The []
take care of ambiguous characters. However, if I want to extend this to allow more complex expressions, this simple scheme fails. e.g., C[AG]{7,10}[ACGT]{5,8}ATGC
will result in a regular expression GCAT{8,5}[ACGT]{01,7}[CT]G
which is not what we want (after curly braces are also accounted for). The expected reverse complement for this would be GCAT[ACGT]{5,8}[CT]{7,10}G
. How could I go about this?
Upvotes: 2
Views: 969
Reputation: 46187
To get the regexes with quantifiers to work correctly, you'll need to reverse the expression element-wise rather than character-wise. By "element-wise", I mean that a single character or character class along with the following quantifier (if there is one) must be treated as a single unit. e.g., In your example of C[AG]{7,10}[ACGT]{5,8}ATGC
, there are 7 elements: C-[AG]{7,10}-[ACGT]{5,8}-A-T-G-C. You need to break it down into that list of elements and reverse the order of the list rather than reversing it as a single string.
ETA: Code
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
my $re = 'C[AG]{7,10}[ACGT]{5,8}ATGC';
$re =~ tr/ACGTacgt/TGCAtgca/;
my @elem = $re =~ /((?:\[.*?\]|.)(?:\{.*?})?)/g;
my $rev = join '', reverse @elem;
say $rev;
Output:
GCAT[TGCA]{5,8}[TC]{7,10}G
Upvotes: 2