Reputation: 177
I am working with the python3 re
module to strip a string of anything not a digit or a '.'
My first try was this:
r = re.sub('[^0-9].', '', s)
And of course anytime I had a '.'
in there it wouldn't work right. So I added a backslash in front of the '.'
and it works perfectly.
My question is, while I understand why the first expression didn't work, I do not understand why it would match both the '.'
character and the character immediately after it.
What I would have expected from reading the documentation is that given a string of '15.45'
I would have ended up with a string like this: '1545'
, since the '.'
would match all characters except the 0-9 that I already excluded.
Can someone enlighten me as to what is happening here?
Upvotes: 0
Views: 59
Reputation: 21274
You're matching two characters with [^0-9].
.
Match 1: Something that is not a digit ([^0-9]
)
Match 2: Anything (.
)
Put the period match (.
) inside your "not these characters" set instead:
import re
s = "b15_.45a"
re.sub('[^.0-9]+', '', s)
# '15.45'
That will give you "a string of anything not a digit or a '.'".
Upvotes: 2