Reputation: 133
Assuming I have this entry in a dictionary:
'Michaele Frendu': ['Micheli Frendu', 'Michael', 'Michaele']
which means that for every instance of the value in the list, it has to be replaced by the key.
ie:
if I have this sample input:
s = 'concessit et assignavit Micheli Frendu presenti viridarium'
this would be replaced by:
s = 'concessit et assignavit Michaele Frendu presenti viridarium'
The problem is when I already have a Michaele Frendu in my text and Michaele is also an item in the list ex:
s = 'Pro Michaele Frendu contra Lucam Zamit'
This is changing to:
s = 'Pro Michaele Frendu Frendu contra Lucam Zamit'
where my desired output is:
s = 'Pro Michaele Frendu contra Lucam Zamit'
In this case I don't want any replacement as the value is already equal to the key.
I am using this regex pattern but is not working:
my_regex = r"\b(?=\w)" + re.escape(l) + r"\b(?!\w)"
s = re.sub(my_regex, k, s)
where k is the key and l is a value from the list
Upvotes: 1
Views: 52
Reputation: 106455
You can simply place the replacement in the first of your regex alternation list, so that it will replace the replacement with itself, with higher precedence than the alternative keywords:
import re
d = {'Michaele Frendu': ['Micheli Frendu', 'Michael', 'Michaele']}
s = 'Pro Michaele Frendu contra Lucam Zamit'
for k, v in d.items():
print(re.sub('|'.join(map(re.escape, (k, *v))), k, s))
This outputs:
Pro Michaele Frendu contra Lucam Zamit
And with s = 'concessit et assignavit Micheli Frendu presenti viridarium'
, this outputs:
concessit et assignavit Michaele Frendu presenti viridarium
For clarity, note that '|'.join(map(re.escape, (k, *v)))
returns the following during the iteration:
Michaele\ Frendu|Micheli\ Frendu|Michael|Michaele
Upvotes: 1