Reputation: 1054
My goal is to look through a string and extract the names of the following form: "$name$"
(with no spaces). I want to extract the name (without the "$"s) and then replace the name with a number. So for example, I want to do something along the lines of the following:
raw_string = "Hello $tim$ my name is $sam$ I'll call you $tim$"
m = re.compile(r"\$(\S+)\$")
to end up with something like this
names # { "tim": 0, "sam": 1 }
parsed_string # "Hello $0$ my name is $1$ I'll call you $0$"
Is there a better or more efficient way to do this without doing it in two steps using re.search()
or re.findall()
and then re.sub()
?
Can you perform a sub on a Match Object
in place?
Or would it be more efficient to just do two passes, to find all the matches and then replace them?
Sorry if this is a repeat question, I didn't find any solutions. Thanks for your help!
Upvotes: 4
Views: 774
Reputation: 13858
As mentioned, using regex here might not be the best answer as the multiple matches of $tim$
wouldn't carrying the same instance over the text. Normally in these cases you want to replace a %0
or similar with the actual texts of tim
, but given that you do mean to get your desired result, then you would need to do some extra handling to get the keys to correspond:
import re
text = "Hello $tim$ my name is $sam$ I'll call you $tim$"
pat = re.compile(r'\$(.+?)\$')
# create a dictionary based on the first occurred index of the matched group
map_keys = {k: str(v) for v, k in list(enumerate(pat.findall(text)))[::-1]}
map_keys
# {'tim': '0', 'sam': '1'}
Once you have built that, you could do a re.sub
with a custom function:
result = pat.sub(lambda x: '${}$'.format(map_keys.get(x.groups()[0])), text)
result
# "Hello $0$ my name is $1$ I'll call you $0$"
Notice this is not optimal as you would need to recreate the $
placeholders to match your desired text.
Upvotes: 1
Reputation: 22817
There's no function that I'm aware of (even in PyPi regex library) that allows you to both capture and replace simultaneously. What you mean by most pythonic, no one knows (it's opinion-based), but, I think this is a clean way to accomplish this in python without having to do both a find (re.search
or re.findall
) and replace (re.sub
).
Conditional replacements are not possible without a callback as you're replacing your text with different values. Yes, you could create a for loop and find every instance of \$([^$]+)\$
, but then you run into a new issue: You can't replace duplicate instances with the same digit without using additional logic (second instance of $tim$
would become $2$
instead of $0
).
Then someone might think of backreferences. Since a backreference only works after the text has been captured, you cannot replace your multiple instances of $tim$
with $0$
without first having located each of them in the string. Backreferences won't work because the group they reference must have a match prior to using the backreference, otherwise it's set to an empty string: \1(.)
only matches one character since \1
is a backreference to capture group 1 that is currently set to no match; whereas (.)\1
will match two characters.
At this point, we might want to default back to two method calls for searching and replacing. But there's one neat little way of accomplishing this: callback.
You can accomplish what you're trying to do by using a callback in re.sub
. You still have to add logic for duplicate instances, but it's much better than making a call to two different methods for matching and replacing.
import re
names = {}
def repl(m):
n = m.group(1)
if n not in names:
names[n]=len(names)
return "$"+str(names[n])+"$"
s = "Hello $tim$ my name is $sam$ I'll call you $tim$"
r = re.compile(r'\$([^$]+)\$')
s = re.sub(r,repl,s)
print(names)
print(s)
Result:
{'tim': 0, 'sam': 1}
Hello $0$ my name is $1$ I'll call you $0$
Upvotes: 2