user2388809
user2388809

Reputation: 115

regex: matching a repeating sequence

I'm trying to construct a regular expression that will match a repeating DNA sequence of 2 characters. These characters can be the same.

The regex should match a repeating sequence of 2 characters at least 3 times and, here are some examples:

regex should match on:

and should not match on:

So far I've come up with the following regular expressions:

[ACGT]{2}

this captures any sequence consisting of exactly two characters (A, C, G or T). Now I want to repeat this pattern at least three times, so I tried the following regular expressions:

[ACGT]{2}{3,}
([ACGT]{2}){3,}

Unfortunately, the first one raises a 'multiple repeat' error (Python), while the second one will simply match any sequence with 6 characters consisting of A, C, G and T.

Is there anyone that can help me out with this regular expression? Thanks in advance.

Upvotes: 4

Views: 3970

Answers (2)

Jerry
Jerry

Reputation: 71538

You could perhaps make use of backreferences.

([ATGC]{2})\1{2,}

\1 is the backreference referring to the first capture group and will be what you have captured.

regex101 demo

Upvotes: 8

Alec Teal
Alec Teal

Reputation: 5918

One:

(AT){3}

Two

(GA){4}

Three

C{6}

Combining them!

(C{6}|(GA){4}|(AT){3})

Upvotes: 0

Related Questions