Reputation: 519
I have text with placeholders like:
sometext $plc_hldr1 some more text $plc_hldr2 some more more text $1234date_placeholder some text $5678date_placeholder
Then I have dictionary, where keys represents placeholders and value is the value by which the placeholders should be replaced by:
placeholders = {'$plc_hldr1': '1111',
'$plc_hldr2': 'abcd'}
I found and adjusted the function to process the replacement:
def multiple_replace(adict, text):
# Create a regular expression from all of the dictionary keys
regex = re.compile("|".join(map(re.escape, adict.keys( ))))
# For each match, look up the corresponding value in the dictionary
return regex.sub(lambda match: adict[match.group(0)], text)
The function is doing it's job for $plc_hldr1
and $plc_hldr2
.
But there are $1234date_placeholder
and $5678date_placeholder
- both should be replaced with one predefined value. In those case date_placeholder
stays the same but number part is always different.
What I came up with is:
def multiple_replace(adict, text):
# Create a regular expression from all of the dictionary keys
regex = re.compile("|".join(map(re.escape, adict.keys( ))))
regex = re.sub("\$\d*date_placeholder", "20200101", txt)
# For each match, look up the corresponding value in the dictionary
return regex.sub(lambda match: adict[match.group(0)], text)
But is there more elegant way of doing that? In case I have some more placeholders with variable numeric part which should be replaced by the same value (eg $1234dname_placeholder, $1234age_placeholder)?
Upvotes: 0
Views: 205
Reputation: 1411
You could combine \$\d*date_placeholder
with the rest of the placeholders, IF the rest of the placeholders do not need to be escaped. Then, create a second dictionary without any of the special regular expression characters, to use when looking up what to replace a regex match with.
map(re.escape, adict.keys())
is necessary in your code above since you have the special regex character $
in the placeholder names. I would recommend adding the special character escapes yourself, and adding your \$\d*date_placeholder
lookup as a key/value pair in placeholders
. This removes both the need to map re.escape
over your keys and the need to use a second substitution in the multiple_replace
function.
Like so...
import re
placeholders = {r'\$plc_hldr1': '1111',
r'\$plc_hldr2': 'abcd',
r'\$\d*date_placeholder': '20200101'}
def remove_escape_chars(reggie):
return re.sub(r'\\\$\\d\*|\$\d*|\\', '', reggie)
def multiple_replace(escape_dict, text):
# Create a second dictionary to lookup regex match replacement targets
unescaped_placeholders = { remove_escape_chars(k): placeholders[k] for k in placeholders }
# Create a regular expression from all of the dictionary keys
regex = re.compile("|".join(escape_dict.keys()))
return regex.sub(lambda match: unescaped_placeholders[remove_escape_chars(match.group(0))], text)
text = "sometext $plc_hldr1 some more text $plc_hldr2 some more more text $1234date_placeholder some text $5678date_placeholder"
result = multiple_replace(placeholders, text)
print(result)
The downside of this approach is the fact you would have to update the regex in the remove_escape_chars(...)
function if you introduced a new pattern into placeholders. (It WILL extend to similar patterns, such as $1234dname_placeholder
or $1234age_placeholder
.)
Upvotes: 1