Reputation: 14112
The problem: I want to remove a specific code within a question. The code changes position from question to question so I cannot rely on the position of the code to remove it.
Here is what it looks like:
Now thinking specifically about the home improvement brand _Everest<.br/>On a scale of 0 to 10, where 0 is "Not at all familiar/ knowledgeable" and 10 is "Very familiar/ knowledgeable", how familiar / knowledgeable do you consider yourself to be with..."
The code - <.br>
- is always attached to the word before and after.
Solution: I would like to know how, if there is a function, to delete/remove a set of characters which start with x and end with x and removes everything between it.
I hope this makes sense.
Upvotes: 0
Views: 345
Reputation: 213025
import re
def remove_between_anchors(text, anchor):
return re.sub(r'{0}.+?{0}'.format(anchor), '', text)
remove_between_anchors('123aa456aa789', 'aa') # returns '123789'
EDIT: if the start/end anchors are different:
def remove_between_anchors(text, start, end):
return re.sub(r'{0}.+?{1}'.format(start, end), '', text)
remove_between_anchors('123<abc>456', '<', '>') # returns '123456'
Upvotes: 2