oyster
oyster

Reputation: 567

greedy backreference in python's reguar expression?

In my case, I want to capture repeated characters in text; at the same time, at most 3 characters before and behind the repeated patterns should be captured too. For example,

original prefix repeat postfix
1aab 1 aa b
1aaab 1 aaa b
1234aaabcde 234 aaa bcd

I coined a RE string in python:

reobj = re.compile("(?P<prefix>.{0,3})    (?P<repeat>(?P<infix>[a-z])(?P=infix){1,})    (?P<postfix>.{0,3})", re.IGNORECASE | re.VERBOSE | re.DOTALL)

but it gives such a result:

original prefix repeat postfix is desired?
1aab 1 aa b yes
1aaab 1a aa b no
1234aaabcde 234 aaa bcd yes

any help? Thanks.

Upvotes: 3

Views: 69

Answers (1)

The fourth bird
The fourth bird

Reputation: 163372

You can use 4 capture groups, where group infix is only for capturing a single char to be repeated.

(?P<prefix>.{0,3}?)(?P<repeat>(?P<infix>[a-z])(?P=infix)+)(?P<postfix>.{0,3})

Regex demo

Upvotes: 3

Related Questions