Reputation: 5278
Consider:
text = "abcdef"
pattern = "(b|e)cd(b|e)"
repl = [r"\1bla\2", r"\1blabla\2"]
text = re.sub(pattern, lambda m: random.choice(repl), text)
I want to replace matches randomly with entries of a list repl
. But when using lambda m: random.choice(repl)
as a callback, it doesn't replace \1
, \2
etc. with its captures any more, returning "\1bla\2"
as plain text.
I've tried to look up re.py on how they do it internally, so I might be able to call the same internal function, but it doesn't seem trivial.
The example above returns a\1bla\2f
or a\1blabla\2f
while abblaef
or abblablaef
are valid options in my case.
Note that I'm using a function, because, in case of several matches like text = "abcdef abcdef"
, it should randomly choose a replacement from repl
for every match – instead of using the same replacement for all matches.
Upvotes: 7
Views: 1031
Reputation: 4418
In the example, the capture groups are put back where they were without change. So change the pattern to use lookahead and look behind assertions instead:
replacements = ['bla', 'blabla']
re.sub(r"(?<=b|e)cd(?=b|e)", lambda mo:random.choice(replacements), text)
This matches cd
if preceeded by a b|e
and followed by b|e
.
Alternatively, the replacement function receives a match object, so it has access to all the match groups:
re.sub(pattern, lambda mo:f"{mo[1]}{random.choice(replacements)}{mo[2]}", text)
where mo
is the match object, mo[1]
is the first capture group and mo[2]
is the second.
Upvotes: 0
Reputation: 147206
One way to do this (and ensure random replacements) is to nest calls to re.sub
:
text = "abcdef abcdef"
pattern = "(b|e)cd(b|e)"
repl = [r"\1bla\2", r"\1blabla\2"]
text = re.sub(pattern, lambda m: re.sub(r'\\(\d+)', lambda m1: m.group(int(m1.group(1))), random.choice(repl)), text)
print(text)
Output varies between
abblaef abblaef
abblaef abblablaef
abblablaef abblaef
abblablaef abblablaef
It turns out my nested call was basically the equivalent of m.expand
, as described in Mark Meyer's answer.
Upvotes: 1
Reputation: 92460
If you pass a function you lose the automatic escaping of backreferences. You just get the match object and have to do the work. So you could:
Pick a string in the regex rather than passing a function:
text = "abcdef"
pattern = "(b|e)cd(b|e)"
repl = [r"\1bla\2", r"\1blabla\2"]
re.sub(pattern, random.choice(repl), text)
# 'abblaef' or 'abblablaef'
Or write a function that processes the match object and allows more complex processing. You can take advantage of expand
to use back references:
text = "abcdef abcdef"
pattern = "(b|e)cd(b|e)"
def repl(m):
repl = [r"\1bla\2", r"\1blabla\2"]
return m.expand(random.choice(repl))
re.sub(pattern, repl, text)
# 'abblaef abblablaef' and variations
You can, or course, put that function into a lambda:
repl = [r"\1bla\2", r"\1blabla\2"]
re.sub(pattern, lambda m: m.expand(random.choice(repl)), text)
Upvotes: 8