Reputation: 3286
I have problem understanding regexp matching. In short this small script gives wrong result.
In:
#!/usr/bin/env python
import re
base = '/show/summer/2015/party/my_brand/'
pt1 = '^/show/(?P<season>.+)/(?P<year>[0-9]+)/(?P<type>.+)/(?P<brand>.+)/$'
pt2 = '^/show/(?P<season>.+)/(?P<year>[0-9]+)/(?P<type>.+)/$'
print base, '==', pt1, re.match(pt1, base) is not None
print base, '==', pt2, re.match(pt2, base) is not None
Out:
/show/summer/2015/party/my_brand/ == ^/show/(?P<season>.+)/(?P<year>[0-9]+)/(?P<type>.+)/(?P<brand>.+)/$ True
/show/summer/2015/party/my_brand/ == ^/show/(?P<season>.+)/(?P<year>[0-9]+)/(?P<type>.+)/$ True
Clearly I was expecting only pt1 to be matched. I am pretty sure my pattern is wrong and I should change something to be more greedy (guesswork in here).
Anyone call point me what I do not know about regexp.
Upvotes: 3
Views: 127
Reputation: 2322
I think you should use [^/]+ instead of a .+ to capture the text, otherwise the dot can also capture the slash.
Upvotes: 2
Reputation: 59113
"(?P<type>.+)"
will match "party/my_brand"
because .
matches any character (including the slash).
To prevent it matching a slash, you could use:
pt2 = '^/show/(?P<season>[^/]+)/(?P<year>[0-9]+)/(?P<type>[^/]+)/$'
where [^/]
means "any character that is not a slash".
Upvotes: 3