How to capture text and replace it simultaneously in python?

Question

My goal is to look through a string and extract the names of the following form: "$name$" (with no spaces). I want to extract the name (without the "$"s) and then replace the name with a number. So for example, I want to do something along the lines of the following:

raw_string = "Hello $tim$ my name is $sam$ I'll call you $tim$"
m = re.compile(r"\$(\S+)\$")

to end up with something like this

names          # { "tim": 0, "sam": 1 }
parsed_string  # "Hello $0$ my name is $1$ I'll call you $0$"

Is there a better or more efficient way to do this without doing it in two steps using re.search() or re.findall() and then re.sub()?

Can you perform a sub on a Match Object in place?

Or would it be more efficient to just do two passes, to find all the matches and then replace them?

Sorry if this is a repeat question, I didn't find any solutions. Thanks for your help!

ctwheels · Accepted Answer

Combining match and replace

There's no function that I'm aware of (even in PyPi regex library) that allows you to both capture and replace simultaneously. What you mean by most pythonic, no one knows (it's opinion-based), but, I think this is a clean way to accomplish this in python without having to do both a find (re.search or re.findall) and replace (re.sub).

Conditional replacements

Conditional replacements are not possible without a callback as you're replacing your text with different values. Yes, you could create a for loop and find every instance of \$([^$]+)\$, but then you run into a new issue: You can't replace duplicate instances with the same digit without using additional logic (second instance of $tim$ would become $2$ instead of $0).

Then someone might think of backreferences. Since a backreference only works after the text has been captured, you cannot replace your multiple instances of $tim$ with $0$ without first having located each of them in the string. Backreferences won't work because the group they reference must have a match prior to using the backreference, otherwise it's set to an empty string: \1(.) only matches one character since \1 is a backreference to capture group 1 that is currently set to no match; whereas (.)\1 will match two characters.

At this point, we might want to default back to two method calls for searching and replacing. But there's one neat little way of accomplishing this: callback.

Using a callback

You can accomplish what you're trying to do by using a callback in re.sub. You still have to add logic for duplicate instances, but it's much better than making a call to two different methods for matching and replacing.

import re

names = {}
def repl(m):
    n = m.group(1)
    if n not in names:
        names[n]=len(names)
    return "$"+str(names[n])+"$"

s = "Hello $tim$ my name is $sam$ I'll call you $tim$"
r = re.compile(r'\$([^$]+)\$')
s = re.sub(r,repl,s)

print(names)
print(s)

Result:

{'tim': 0, 'sam': 1}
Hello $0$ my name is $1$ I'll call you $0$

How to capture text and replace it simultaneously in python?

Answers (2)

Combining match and replace

Conditional replacements

Using a callback

Related Questions