Reputation: 1502
I would like to match a regular expression in a string and add the character 0
after all occurrences. That is, each match will be replaced with itself followed by 0
. But because 0
is a digit, I don't know how to write it in the replacement pattern given as the second argument to re.sub
.
Let me give an example of an easier problem: add the character X
after all vowels.
import re
s = 'hello'
r = re.sub('([aeiou])', r'\1X', s)
print(r)
This prints heXlloX
.
But suppose instead of adding the character X
I want to add the character 0
. If I try to write this
r = re.sub('([aeiou])', r'\10', s)
then it thinks I am making a backreference to the capturing group numbered 10, and fails with invalid group reference 10
.
I know for this particular pattern I could rework it as a lookbehind assertion, so that the replacement pattern would no longer need a backreference.
r = re.sub('(?<=[aeiou])', '0', s)
That works -- but not all regular expressions can be used as lookbehind in this way.
Another approach would be to manually break apart the input string at match locations, perhaps with re.finditer
, then paste it back together with the 0
character at the places I want. But I'm hoping to avoid that.
While writing this question I have found the answer, which I will post below.
Upvotes: 0
Views: 18
Reputation: 1502
re.sub
can take a function as its second argument. That function is passed the Match
object.
import re
s = 'hello'
def f(matchobj):
return matchobj.group(1) + 'X'
r = re.sub('([aeiou])', f, s)
print(r)
This prints he0llo0
as required.
In general I think any replacement pattern with backreferences \N
can be rewritten as a callback which uses matchobj.group(N)
.
Interestingly, because matchobj.group(0)
gives the whole match, you can do without the capturing group:
import re
s = 'hello'
def f(matchobj):
return matchobj.group(0) + '0'
r = re.sub('[aeiou]', f, s)
print(r)
That also works.
Upvotes: 0