Gannina
Gannina

Reputation: 133

Regex replace matches

Assuming I have this entry in a dictionary:

'Michaele Frendu': ['Micheli Frendu', 'Michael', 'Michaele']

which means that for every instance of the value in the list, it has to be replaced by the key.

ie:

if I have this sample input:

s = 'concessit et assignavit Micheli Frendu presenti viridarium'

this would be replaced by:

s = 'concessit et assignavit Michaele Frendu presenti viridarium'

The problem is when I already have a Michaele Frendu in my text and Michaele is also an item in the list ex:

s = 'Pro Michaele Frendu contra Lucam Zamit'

This is changing to:

s = 'Pro Michaele Frendu Frendu contra Lucam Zamit'

where my desired output is:

s = 'Pro Michaele Frendu contra Lucam Zamit'

In this case I don't want any replacement as the value is already equal to the key.

I am using this regex pattern but is not working:

my_regex = r"\b(?=\w)" + re.escape(l) + r"\b(?!\w)"
s = re.sub(my_regex, k, s)

where k is the key and l is a value from the list

Upvotes: 1

Views: 52

Answers (1)

blhsing
blhsing

Reputation: 106455

You can simply place the replacement in the first of your regex alternation list, so that it will replace the replacement with itself, with higher precedence than the alternative keywords:

import re
d = {'Michaele Frendu': ['Micheli Frendu', 'Michael', 'Michaele']}
s = 'Pro Michaele Frendu contra Lucam Zamit'
for k, v in d.items():
    print(re.sub('|'.join(map(re.escape, (k, *v))), k, s))

This outputs:

Pro Michaele Frendu contra Lucam Zamit

And with s = 'concessit et assignavit Micheli Frendu presenti viridarium', this outputs:

concessit et assignavit Michaele Frendu presenti viridarium

For clarity, note that '|'.join(map(re.escape, (k, *v))) returns the following during the iteration:

Michaele\ Frendu|Micheli\ Frendu|Michael|Michaele

Upvotes: 1

Related Questions