Reputation: 115
I'm trying to construct a regular expression that will match a repeating DNA sequence of 2 characters. These characters can be the same.
The regex should match a repeating sequence of 2 characters at least 3 times and, here are some examples:
regex should match on:
and should not match on:
So far I've come up with the following regular expressions:
[ACGT]{2}
this captures any sequence consisting of exactly two characters (A, C, G or T). Now I want to repeat this pattern at least three times, so I tried the following regular expressions:
[ACGT]{2}{3,}
([ACGT]{2}){3,}
Unfortunately, the first one raises a 'multiple repeat' error (Python), while the second one will simply match any sequence with 6 characters consisting of A, C, G and T.
Is there anyone that can help me out with this regular expression? Thanks in advance.
Upvotes: 4
Views: 3970
Reputation: 71538
You could perhaps make use of backreferences.
([ATGC]{2})\1{2,}
\1
is the backreference referring to the first capture group and will be what you have captured.
Upvotes: 8
Reputation: 5918
One:
(AT){3}
Two
(GA){4}
Three
C{6}
Combining them!
(C{6}|(GA){4}|(AT){3})
Upvotes: 0