Sean O
Sean O

Reputation: 177

What is happening with this regular expression?

I am working with the python3 re module to strip a string of anything not a digit or a '.'

My first try was this:

r = re.sub('[^0-9].', '', s)

And of course anytime I had a '.' in there it wouldn't work right. So I added a backslash in front of the '.' and it works perfectly.

My question is, while I understand why the first expression didn't work, I do not understand why it would match both the '.' character and the character immediately after it.

What I would have expected from reading the documentation is that given a string of '15.45' I would have ended up with a string like this: '1545', since the '.' would match all characters except the 0-9 that I already excluded.

Can someone enlighten me as to what is happening here?

Upvotes: 0

Views: 59

Answers (1)

andrew_reece
andrew_reece

Reputation: 21274

You're matching two characters with [^0-9]..

Match 1: Something that is not a digit ([^0-9])
Match 2: Anything (.)

Put the period match (.) inside your "not these characters" set instead:

import re
s = "b15_.45a"
re.sub('[^.0-9]+', '', s)
# '15.45'

That will give you "a string of anything not a digit or a '.'".

Upvotes: 2

Related Questions