Python re.match not strict enough

Question

I have problem understanding regexp matching. In short this small script gives wrong result.

In:

#!/usr/bin/env python

import re

base = '/show/summer/2015/party/my_brand/'

pt1 = '^/show/(?P.+)/(?P[0-9]+)/(?P.+)/(?P.+)/$'
pt2 = '^/show/(?P.+)/(?P[0-9]+)/(?P.+)/$'


print base, '==', pt1, re.match(pt1, base) is not None
print base, '==', pt2, re.match(pt2, base) is not None

Out:

/show/summer/2015/party/my_brand/ == ^/show/(?P.+)/(?P[0-9]+)/(?P.+)/(?P.+)/$ True
/show/summer/2015/party/my_brand/ == ^/show/(?P.+)/(?P[0-9]+)/(?P.+)/$ True

Clearly I was expecting only pt1 to be matched. I am pretty sure my pattern is wrong and I should change something to be more greedy (guesswork in here).

Anyone call point me what I do not know about regexp.

khelwood · Accepted Answer

"(?P.+)" will match "party/my_brand" because . matches any character (including the slash).

To prevent it matching a slash, you could use:

pt2 = '^/show/(?P[^/]+)/(?P[0-9]+)/(?P[^/]+)/$'

where [^/] means "any character that is not a slash".

Python re.match not strict enough

Answers (2)

Related Questions