Reputation: 1191
The following code:
import re
print(re.sub('[^a-zA-Z0-9]', '', ',Inc.', re.IGNORECASE).lower())
print(re.sub('[^a-zA-Z0-9]', '', ', Inc.', re.IGNORECASE).lower())
produces:
inc
inc.
https://repl.it/repls/RightThankfulMaintenance
Why?
Upvotes: 2
Views: 90
Reputation: 881333
From the doco, the re.sub
signature is:
re.sub(pattern, repl, string, count=0, flags=0)
So, let's examine your call based on that:
re.sub('[^a-zA-Z0-9]', '' , ', Inc.', re.IGNORECASE) # default
# < pattern > <repl> <string> < count > <flags>
You are passing the flag re.IGNORECASE
(it has the value 2
if you print(int(re.IGNORECASE))
, though I suspect that's not mandated anywhere) as the count to use.
So it only does up to two substitutions, which is the comma and the space at the start in your second example. It also did that in your first example, it's just that there was only one character that matched rather than three, so you didn't notice.
Instead, you should use:
>>> re.sub('[^a-zA-Z0-9]', '', ', Inc.', flags=re.IGNORECASE).lower()
'inc'
Upvotes: 3