Robin Whittleton
Robin Whittleton

Reputation: 6339

Match case in backreference in re.sub

I’ve got some python code that does text substitutions. An example would be:

regex.sub(r"\bPhrensy", r"Frenzy", xhtml) # Phrensy -> Frenzy
regex.sub(r"\bphrensy", r"frenzy", xhtml) # phrensy -> frenzy

As the input could have either case to start the word we have two lines for both the substitutions. It’d be nice if I could condense it down to a single line with a capture group for the ([Pp]), but then the replacement would always be a single case.

I’ve read the backreference docs to see if this functionality is present, but I can’t see anything. It’s probably not in the language, but just in case: am I missing case matching in backreference substitution in Python3?

Ideally, case matching would also be Unicode case aware, but if it only works with ASCII that’s acceptable.

Upvotes: 2

Views: 321

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627335

There is no such functionality, replacement backreferences always contain the exact text that was captured into the corresponding group.

What you can do is evaluate the match and apply custom logic when replacing:

import re
text = "phrensy likes me. Phrensy doesn't."
print ( re.sub(r"\b([Pp])hrensy", lambda x: ("F" if x.group(1).isupper() else "f") + r"renzy", text) )
# => frenzy likes me. Frenzy doesn't.

See the Python demo

Upvotes: 3

Related Questions