Reputation: 33
I'm trying to replace a word (e.g. on
) if it falls between two substrings (e.g. <temp>
& </temp>
) however other words are present which need to be kept.
string = "<temp>The sale happened on February 22nd</temp>"
The desired string after the replace would be:
Result = <temp>The sale happened {replace} February 22nd</temp>
I've tried using regex, I've only been able to figure out how to replace everything lying between the two <temp>
tags. (Because of the .*?
)
result = re.sub('<temp>.*?</temp>', '{replace}', string, flags=re.DOTALL)
However on
may appear later in the string not between <temp></temp>
and I wouldn't want to replace this.
Upvotes: 3
Views: 201
Reputation: 3011
re.sub('(<temp>.*?) on (.*?</temp>)', lambda x: x.group(1)+" <replace> "+x.group(2), string, flags=re.DOTALL)
Output:
<temp>The sale happened <replace> February 22nd</temp>
Edit:
Changed the regex based on suggestions by Wiktor and HolyDanna.
P.S: Wiktor's comment on the question provides a better solution.
Upvotes: 1
Reputation:
Try lxml
:
from lxml import etree
root = etree.fromstring("<temp>The sale happened on February 22nd</temp>")
root.text = root.text.replace(" on ", " {replace} ")
print(etree.tostring(root, pretty_print=True))
Output:
<temp>The sale happened {replace} February 22nd</temp>
Upvotes: 0