James Hallen
James Hallen

Reputation: 5014

Python regex example

If I want to replace a pattern in the following statement structure:

cat&345;
bat &#hut;

I want to replace elements starting from & and ending before (not including ;). What is the best way to do so?

Upvotes: 0

Views: 105

Answers (4)

dawg
dawg

Reputation: 104072

You can use negated character classes to do this:

import re

st='''\
cat&345;
bat &#hut;'''

for line in st.splitlines():
    print line
    print re.sub(r'([^&]*)&[^;]*;',r'\1;',line)

Upvotes: 0

John Szakmeister
John Szakmeister

Reputation: 47082

Maybe go a different direction all together and use HTMLParser.unescape(). The unescape() method is undocumented, but it doesn't appear to be "internal" because it doesn't have a leading underscore.

Upvotes: 0

Mark Tolonen
Mark Tolonen

Reputation: 177991

Including or not including the & in the replacement?

>>> re.sub(r'&.*?(?=;)','REPL','cat&345;')           # including
'catREPL;'
>>> re.sub(r'(?<=&).*?(?=;)','REPL','bat &#hut;')    # not including
'bat &REPL;'

Explanation:

  • Although not required here, use a r'raw string' to prevent having to escape backslashes which often occur in regular expressions.
  • .*? is a "non-greedy" match of anything, which makes the match stop at the first semicolon.
  • (?=;) the match must be followed by a semicolon, but it is not included in the match.
  • (?<=&) the match must be preceded by an ampersand, but it is not included in the match.

Upvotes: 1

aaronman
aaronman

Reputation: 18751

Here is a good regex
import re
result = re.sub("(?<=\\&).*(?=;)", replacementstr, searchText)

Basically this will put the replacement in between the & and the ;

Upvotes: 1

Related Questions