Reputation: 6339
I’ve got some python code that does text substitutions. An example would be:
regex.sub(r"\bPhrensy", r"Frenzy", xhtml) # Phrensy -> Frenzy
regex.sub(r"\bphrensy", r"frenzy", xhtml) # phrensy -> frenzy
As the input could have either case to start the word we have two lines for both the substitutions. It’d be nice if I could condense it down to a single line with a capture group for the ([Pp])
, but then the replacement would always be a single case.
I’ve read the backreference docs to see if this functionality is present, but I can’t see anything. It’s probably not in the language, but just in case: am I missing case matching in backreference substitution in Python3?
Ideally, case matching would also be Unicode case aware, but if it only works with ASCII that’s acceptable.
Upvotes: 2
Views: 321
Reputation: 627335
There is no such functionality, replacement backreferences always contain the exact text that was captured into the corresponding group.
What you can do is evaluate the match and apply custom logic when replacing:
import re
text = "phrensy likes me. Phrensy doesn't."
print ( re.sub(r"\b([Pp])hrensy", lambda x: ("F" if x.group(1).isupper() else "f") + r"renzy", text) )
# => frenzy likes me. Frenzy doesn't.
See the Python demo
Upvotes: 3