Cadel Watson
Cadel Watson

Reputation: 511

Python re.sub: replace part of matching string that contains an arbitrary number of capturing groups

I know that there are other questions which deal with the problem of replacing only part of a matching string using re.sub, but the answers revolve around referring back to capturing groups. My situation is a bit different:

I'm generating regexes like '(?:i|æ|ʏ|ɞ).(?:i|æ|ʏ|ɞ)' and ^. in another part of the application. If I have the string 'abcd', and the pair ('b', 'c'), I want to replace all instances of b where the regex matches at the period character (.).

For example, if I have the rule '(?:x|y|z).(?:h|i|j)', and the desired change is a to b, the following should occur:

xah -> xbh
yai -> ybi
zaz -> zaz (no change)

I've tried using re.sub, replacing the . with my target in the search string and with my replacement in the replacement string, but this replaces the whole match in the target string, when in reality I only want to change a small part. My problem with using match groups and referring back to them in the replacement is that I don't know how many there will be, or what order they'll be in - there might not even be any - so I'm trying to find a flexible solution.

Any help is very appreciated! It's quite difficult to explain, so if further clarification is needed please ask :).

Upvotes: 1

Views: 899

Answers (1)

Robᵩ
Robᵩ

Reputation: 168616

You could use "lookahead" and "lookbehind" assertions, like so:

import re

tests = (
    ('xah', 'xbh'),
    ('yai', 'ybi'),
    ('zaz', 'zaz'),
)

for test_in, test_out in tests:
    out = re.sub('(?<=x|y|z)a(?=h|i|j)', 'b', test_in)
    assert test_out == out

Upvotes: 3

Related Questions