Reputation: 408
I've been trying to do the following : Given a char like "i", find and replace the second of every pair of "i" (without overlapping).
"I am so irritated with regex. Seriously" -> "I am so rritated wth regex. Seriously".
I almost found a solution using positive lookbehind, but it's overlapping :(
Can anyone help me?
My best was this (I think) -> "(?<=i).*?(i)"
EDIT : My description is wrong. I am supposed to replace the SECOND item of a pair, so the result should've been: "I am so irrtated with regex. Serously"
Upvotes: 0
Views: 139
Reputation: 626738
Your regex matches overlapped substrings because of the lookbehind (?<=i)
. You need to use a consuming pattern for non-overlapping matches:
i([^i]*i)
Replace with \1
backreference to the text captured with ([^i]*i)
.
See the regex demo.
The pattern matches:
i
- a literal i
, after matching it, the regex index advances to the right (the regex engine processes the string from left to right by default, in re
, there is no other option), 1 char([^i]*i)
- this is Group 1 matching 0+ characters other than i
up to the first i
. The whole captured value is inside .group(1)
. After matching it, the regex index is after the second i
matched and consumed with the whole pattern. Thus, no overlapping matches occur when the regex engine goes on to look for the remaining matches in the string.import re
pat = "i"
p = re.compile('{0}([^{0}]*{0})'.format(pat))
test_str = "I am so irritated with regex. Seriously"
result = re.sub(p, r"\1", test_str)
print(result)
Upvotes: 2