Farhat
Farhat

Reputation: 1213

reversing a regular expression

I want to have the reverse complement of a DNA string in Perl. So that is simple enough and I have the following expression.

$revcomp =~ tr/ACGTacgt[]N/TGCAtgca][./;

followed by reversing the string. The [] take care of ambiguous characters. However, if I want to extend this to allow more complex expressions, this simple scheme fails. e.g., C[AG]{7,10}[ACGT]{5,8}ATGC will result in a regular expression GCAT{8,5}[ACGT]{01,7}[CT]G which is not what we want (after curly braces are also accounted for). The expected reverse complement for this would be GCAT[ACGT]{5,8}[CT]{7,10}G. How could I go about this?

Upvotes: 2

Views: 969

Answers (1)

Dave Sherohman
Dave Sherohman

Reputation: 46187

To get the regexes with quantifiers to work correctly, you'll need to reverse the expression element-wise rather than character-wise. By "element-wise", I mean that a single character or character class along with the following quantifier (if there is one) must be treated as a single unit. e.g., In your example of C[AG]{7,10}[ACGT]{5,8}ATGC, there are 7 elements: C-[AG]{7,10}-[ACGT]{5,8}-A-T-G-C. You need to break it down into that list of elements and reverse the order of the list rather than reversing it as a single string.

ETA: Code

#!/usr/bin/env perl    

use strict;
use warnings;
use 5.010;

my $re = 'C[AG]{7,10}[ACGT]{5,8}ATGC';

$re =~ tr/ACGTacgt/TGCAtgca/;

my @elem = $re =~ /((?:\[.*?\]|.)(?:\{.*?})?)/g;

my $rev = join '', reverse @elem;

say $rev;

Output:

GCAT[TGCA]{5,8}[TC]{7,10}G

Upvotes: 2

Related Questions