Reputation: 5014
If I want to replace a pattern in the following statement structure:
cat&345;
bat &#hut;
I want to replace elements starting from &
and ending before (not including ;
). What is the best way to do so?
Upvotes: 0
Views: 105
Reputation: 104072
You can use negated character classes to do this:
import re
st='''\
cat&345;
bat &#hut;'''
for line in st.splitlines():
print line
print re.sub(r'([^&]*)&[^;]*;',r'\1;',line)
Upvotes: 0
Reputation: 47082
Maybe go a different direction all together and use HTMLParser.unescape()
. The unescape()
method is undocumented, but it doesn't appear to be "internal" because it doesn't have a leading underscore.
Upvotes: 0
Reputation: 177991
Including or not including the & in the replacement?
>>> re.sub(r'&.*?(?=;)','REPL','cat&345;') # including
'catREPL;'
>>> re.sub(r'(?<=&).*?(?=;)','REPL','bat &#hut;') # not including
'bat &REPL;'
r'raw string'
to prevent having to escape backslashes which often occur in regular expressions..*?
is a "non-greedy" match of anything, which makes the match stop at the first semicolon.(?=;)
the match must be followed by a semicolon, but it is not included in the match.(?<=&)
the match must be preceded by an ampersand, but it is not included in the match.Upvotes: 1
Reputation: 18751
Here is a good regex
import re
result = re.sub("(?<=\\&).*(?=;)", replacementstr, searchText)
Basically this will put the replacement in between the &
and the ;
Upvotes: 1