Reputation: 567
In my case, I want to capture repeated characters in text; at the same time, at most 3 characters before and behind the repeated patterns should be captured too. For example,
original | prefix | repeat | postfix |
---|---|---|---|
1aab | 1 | aa | b |
1aaab | 1 | aaa | b |
1234aaabcde | 234 | aaa | bcd |
I coined a RE string in python:
reobj = re.compile("(?P<prefix>.{0,3}) (?P<repeat>(?P<infix>[a-z])(?P=infix){1,}) (?P<postfix>.{0,3})", re.IGNORECASE | re.VERBOSE | re.DOTALL)
but it gives such a result:
original | prefix | repeat | postfix | is desired? |
---|---|---|---|---|
1aab | 1 | aa | b | yes |
1aaab | 1a | aa | b | no |
1234aaabcde | 234 | aaa | bcd | yes |
any help? Thanks.
Upvotes: 3
Views: 69
Reputation: 163372
You can use 4 capture groups, where group infix is only for capturing a single char to be repeated.
(?P<prefix>.{0,3}?)(?P<repeat>(?P<infix>[a-z])(?P=infix)+)(?P<postfix>.{0,3})
Upvotes: 3