Tyler
Tyler

Reputation: 322

Python regex matching when it should not

I have a list of strings and I want to print out the ones that don't match the regex but I'm having some trouble. The regex seems to match strings that it should not, if there is a substring that starts at the beginning of the string that matches the regex. I'm not sure how to fix this.

Example

>>> import re
>>> pattern = re.compile(r'\d+')
>>> string = u"1+*"
>>> bool(pattern.match(string))
True

I get true because of the 1 at the start. How should I change my regex to account for this?

Note I'm on python 2.6.6

Upvotes: 0

Views: 155

Answers (2)

Alex Huszagh
Alex Huszagh

Reputation: 14614

You should append \Z to the end of the regex, so the regex pattern is '\d+\Z'.

Your code then becomes:

>>> import re
>>> pattern = re.compile(r'\d+\Z')
>>> string = u"1+*"
>>> bool(pattern.match(string))
False

This works because \Z forces matching at only the end of the string. You may also use $, which forces a match at a newline before the end of the string or at the end of the string. If you would like to force the string to only contain numeric values (irrelevant if using re.match, but maybe useful if using other regular expression libraries), you may add a ^ to the front of the pattern, forcing a match at the start of the string. The pattern would then be '^\d+\Z'.

Upvotes: 1

Josh Withee
Josh Withee

Reputation: 11336

Have your regex start with \A and end with \Z. This will make sure that the match begins at the start of the input string, and also make sure that the match ends at the end of the input string.

So for the example you gave, it would look like:

pattern = re.compile(r'\A\d+\Z')

Upvotes: 2

Related Questions