JanFi86
JanFi86

Reputation: 519

Replacing placeholders with dictionary keys/values

I have text with placeholders like:

sometext $plc_hldr1 some more text $plc_hldr2 some more more text $1234date_placeholder some text $5678date_placeholder

Then I have dictionary, where keys represents placeholders and value is the value by which the placeholders should be replaced by:

placeholders = {'$plc_hldr1': '1111',
                '$plc_hldr2': 'abcd'}

I found and adjusted the function to process the replacement:

def multiple_replace(adict, text):
   # Create a regular expression from all of the dictionary keys
    regex = re.compile("|".join(map(re.escape, adict.keys(  ))))

   # For each match, look up the corresponding value in the dictionary
   return regex.sub(lambda match: adict[match.group(0)], text)

The function is doing it's job for $plc_hldr1 and $plc_hldr2.

But there are $1234date_placeholder and $5678date_placeholder - both should be replaced with one predefined value. In those case date_placeholder stays the same but number part is always different.

What I came up with is:

def multiple_replace(adict, text):
   # Create a regular expression from all of the dictionary keys
    regex = re.compile("|".join(map(re.escape, adict.keys(  ))))
    regex = re.sub("\$\d*date_placeholder", "20200101", txt)
   # For each match, look up the corresponding value in the dictionary
   return regex.sub(lambda match: adict[match.group(0)], text)

But is there more elegant way of doing that? In case I have some more placeholders with variable numeric part which should be replaced by the same value (eg $1234dname_placeholder, $1234age_placeholder)?

Upvotes: 0

Views: 205

Answers (1)

SpaceKatt
SpaceKatt

Reputation: 1411

You could combine \$\d*date_placeholder with the rest of the placeholders, IF the rest of the placeholders do not need to be escaped. Then, create a second dictionary without any of the special regular expression characters, to use when looking up what to replace a regex match with.

map(re.escape, adict.keys()) is necessary in your code above since you have the special regex character $ in the placeholder names. I would recommend adding the special character escapes yourself, and adding your \$\d*date_placeholder lookup as a key/value pair in placeholders. This removes both the need to map re.escape over your keys and the need to use a second substitution in the multiple_replace function.

Like so...

import re

placeholders = {r'\$plc_hldr1': '1111',
                r'\$plc_hldr2': 'abcd',
                r'\$\d*date_placeholder': '20200101'}

def remove_escape_chars(reggie):
    return re.sub(r'\\\$\\d\*|\$\d*|\\', '', reggie)

def multiple_replace(escape_dict, text):
   # Create a second dictionary to lookup regex match replacement targets
   unescaped_placeholders = { remove_escape_chars(k): placeholders[k] for k in placeholders }

   # Create a regular expression from all of the dictionary keys
   regex = re.compile("|".join(escape_dict.keys()))
   return regex.sub(lambda match: unescaped_placeholders[remove_escape_chars(match.group(0))], text)

text = "sometext $plc_hldr1 some more text $plc_hldr2 some more more text $1234date_placeholder some text $5678date_placeholder"

result = multiple_replace(placeholders, text)
print(result)

The downside of this approach is the fact you would have to update the regex in the remove_escape_chars(...) function if you introduced a new pattern into placeholders. (It WILL extend to similar patterns, such as $1234dname_placeholder or $1234age_placeholder.)

Upvotes: 1

Related Questions