hoju
hoju

Reputation: 29482

Python regular expression inconsistency

I am getting different results based on whether I precompile a regular expression:

>>> re.compile('mr', re.IGNORECASE).sub('', 'Mr Bean')
' Bean'
>>> re.sub('mr', '', 'Mr Bean', re.IGNORECASE)
'Mr Bean'

The Python documentation says Some of the functions are simplified versions of the full featured methods for compiled regular expressions. However it also claims RegexObject.sub() is Identical to the sub() function.

So what is going on here?

Upvotes: 2

Views: 870

Answers (4)

miku
miku

Reputation: 188194

>>> help(re.sub)
  1 Help on function sub in module re:
  2 
  3 sub(pattern, repl, string, count=0)
  4     Return the string obtained by replacing the leftmost
  5     non-overlapping occurrences of the pattern in string by the
  6     replacement repl.  repl can be either a string or a callable;
  7     if a callable, it's passed the match object and must return
  8     a replacement string to be used.

There is no function parameter in re.sub for regex flags (IGNORECASE, MULTILINE, DOTALL) as in re.compile.

Alternatives:

>>> re.sub("[M|m]r", "", "Mr Bean")
' Bean'

>>> re.sub("(?i)mr", "", "Mr Bean")
' Bean'

Edit Python 3.1 added support for regex flags, http://docs.python.org/3.1/whatsnew/3.1.html. As of 3.1 the signature of e.g. re.sub looks like:

re.sub(pattern, repl, string[, count, flags])

Upvotes: 4

Evan Fosmark
Evan Fosmark

Reputation: 101811

re.sub() can't accept the re.IGNORECASE, it appears.

The documentation states:

sub(pattern, repl, string, count=0)

Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl.  repl can be either a string or a callable;
if a string, backslash escapes in it are processed.  If it is
a callable, it's passed the match object and must return
a replacement string to be used.

Using this works in its place, however:

re.sub("(?i)mr", "", "Mr Bean")

Upvotes: 12

Chinmay Kanchi
Chinmay Kanchi

Reputation: 66063

From the Python 2.6.4 documentation:

re.sub(pattern, repl, string[, count])

re.sub() doesn't take a flag to set the regex mode. If you want re.IGNORECASE, you must use re.compile().sub()

Upvotes: 2

zzzeek
zzzeek

Reputation: 75317

the module level sub() call doesn't accept modifiers at the end. thats the "count" argument - the maximum number of pattern occurrences to be replaced.

Upvotes: 5

Related Questions