Eduardo Almeida
Eduardo Almeida

Reputation: 408

Python regex - Find and replace the second item of a pair

I've been trying to do the following : Given a char like "i", find and replace the second of every pair of "i" (without overlapping).

"I am so irritated with regex. Seriously" -> "I am so rritated wth regex. Seriously". 

I almost found a solution using positive lookbehind, but it's overlapping :(

Can anyone help me?

My best was this (I think) -> "(?<=i).*?(i)"

EDIT : My description is wrong. I am supposed to replace the SECOND item of a pair, so the result should've been: "I am so irrtated with regex. Serously"

Upvotes: 0

Views: 139

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

Your regex matches overlapped substrings because of the lookbehind (?<=i). You need to use a consuming pattern for non-overlapping matches:

i([^i]*i)

Replace with \1 backreference to the text captured with ([^i]*i). See the regex demo.

The pattern matches:

  • i - a literal i, after matching it, the regex index advances to the right (the regex engine processes the string from left to right by default, in re, there is no other option), 1 char
  • ([^i]*i) - this is Group 1 matching 0+ characters other than i up to the first i. The whole captured value is inside .group(1). After matching it, the regex index is after the second i matched and consumed with the whole pattern. Thus, no overlapping matches occur when the regex engine goes on to look for the remaining matches in the string.

Python demo:

import re
pat = "i"
p = re.compile('{0}([^{0}]*{0})'.format(pat))
test_str = "I am so irritated with regex. Seriously"
result = re.sub(p, r"\1", test_str)
print(result)

Upvotes: 2

Related Questions