Ed Avis
Ed Avis

Reputation: 1502

Python re.sub: backreference in replacement pattern followed by digit

I would like to match a regular expression in a string and add the character 0 after all occurrences. That is, each match will be replaced with itself followed by 0. But because 0 is a digit, I don't know how to write it in the replacement pattern given as the second argument to re.sub.

Let me give an example of an easier problem: add the character X after all vowels.

import re
s = 'hello'
r = re.sub('([aeiou])', r'\1X', s)
print(r)

This prints heXlloX.

But suppose instead of adding the character X I want to add the character 0. If I try to write this

r = re.sub('([aeiou])', r'\10', s)

then it thinks I am making a backreference to the capturing group numbered 10, and fails with invalid group reference 10.

I know for this particular pattern I could rework it as a lookbehind assertion, so that the replacement pattern would no longer need a backreference.

r = re.sub('(?<=[aeiou])', '0', s)

That works -- but not all regular expressions can be used as lookbehind in this way.

Another approach would be to manually break apart the input string at match locations, perhaps with re.finditer, then paste it back together with the 0 character at the places I want. But I'm hoping to avoid that.

While writing this question I have found the answer, which I will post below.

Upvotes: 0

Views: 18

Answers (1)

Ed Avis
Ed Avis

Reputation: 1502

re.sub can take a function as its second argument. That function is passed the Match object.

import re
s = 'hello'
def f(matchobj):
    return matchobj.group(1) + 'X'
r = re.sub('([aeiou])', f, s)
print(r)

This prints he0llo0 as required.

In general I think any replacement pattern with backreferences \N can be rewritten as a callback which uses matchobj.group(N).

Interestingly, because matchobj.group(0) gives the whole match, you can do without the capturing group:

import re
s = 'hello'
def f(matchobj):
    return matchobj.group(0) + '0'
r = re.sub('[aeiou]', f, s)
print(r)

That also works.

Upvotes: 0

Related Questions