Drachenfels
Drachenfels

Reputation: 3286

Python re.match not strict enough

I have problem understanding regexp matching. In short this small script gives wrong result.

In:

#!/usr/bin/env python

import re

base = '/show/summer/2015/party/my_brand/'

pt1 = '^/show/(?P<season>.+)/(?P<year>[0-9]+)/(?P<type>.+)/(?P<brand>.+)/$'
pt2 = '^/show/(?P<season>.+)/(?P<year>[0-9]+)/(?P<type>.+)/$'


print base, '==', pt1, re.match(pt1, base) is not None
print base, '==', pt2, re.match(pt2, base) is not None

Out:

/show/summer/2015/party/my_brand/ == ^/show/(?P<season>.+)/(?P<year>[0-9]+)/(?P<type>.+)/(?P<brand>.+)/$ True
/show/summer/2015/party/my_brand/ == ^/show/(?P<season>.+)/(?P<year>[0-9]+)/(?P<type>.+)/$ True

Clearly I was expecting only pt1 to be matched. I am pretty sure my pattern is wrong and I should change something to be more greedy (guesswork in here).

Anyone call point me what I do not know about regexp.

Upvotes: 3

Views: 127

Answers (2)

pzelasko
pzelasko

Reputation: 2322

I think you should use [^/]+ instead of a .+ to capture the text, otherwise the dot can also capture the slash.

Upvotes: 2

khelwood
khelwood

Reputation: 59113

"(?P<type>.+)" will match "party/my_brand" because . matches any character (including the slash).

To prevent it matching a slash, you could use:

pt2 = '^/show/(?P<season>[^/]+)/(?P<year>[0-9]+)/(?P<type>[^/]+)/$'

where [^/] means "any character that is not a slash".

Upvotes: 3

Related Questions