Reputation: 1418
I've tried to match a the below URL for a couple of hours and can't seem to figure it out and Im quite sure its not that difficult:
The URL can be this:
/course/lesson-one/
or it can also be:
/course/lesson-one/chapter-one/
What I have is the following which matches the second URL:
/course/([a-zA-Z]+[-a-zA-Z]*)/([a-zA-Z]+[-a-zA-Z]*)/
What I want is for the second part to be optional but I can't figure it out the closest I got was the following:
/course/([a-zA-Z]+[-a-zA-Z]*)/*([a-zA-Z]+[-a-zA-Z]*)/
But the above for some reason leaves out the last letter of the word for example if the URL is
/course/computers/
I end up with the string 'computer'
Upvotes: 1
Views: 626
Reputation: 523724
You use ?
if you need optional parts.
/course/([a-zA-Z][-a-zA-Z]*)/([a-zA-Z][-a-zA-Z]*/)?
# ^
(Note that [a-zA-Z]+[-a-zA-Z]*
is equivalent to [a-zA-Z][-a-zA-Z]*
.)
Use an additional grouping (?:…)
to exclude the /
from the match, while allowing multiple elements to be optional at once:
/course/([a-zA-Z][-a-zA-Z]*)/(?:([a-zA-Z][-a-zA-Z]*)/)?
# ~~~ ~^
Your 2nd regex swallows the last character, because:
/course/([a-zA-Z]+[-a-zA-Z]*)/*([a-zA-Z]+[-a-zA-Z]*)/
^^^^^^^^^^^^^^^^^^^^^ ~~~~~~~~~~~~~~~~~~~~~
this matches 'computer' and this matches the 's'.
The second group in this regex required to match some alphabets with length 1 or more due to the +
, so the 's' must belong there.
Upvotes: 1
Reputation: 122516
You can use the following regex:
'/course/([a-zA-Z]+[-a-zA-Z]*)(/([a-zA-Z]+[-a-zA-Z]*)/)?'
This makes the second part optional and still matches each of the parts of the URL.
Note that the second part of the URL has two groups: one that matches /chapter-one/
and one that matches chapter-one
>>> re.match('/course/([a-zA-Z]+[-a-zA-Z]*)(/([a-zA-Z]+[-a-zA-Z]*)/)?', '/course/lesson-one/chapter-one/').groups()
('lesson-one', '/chapter-one/', 'chapter-one')
Similarly:
>>> re.match('/course/([a-zA-Z]+[-a-zA-Z]*)(/([a-zA-Z]+[-a-zA-Z]*)/)?', '/course/lesson-one/').groups()
('lesson-one', None, None)
Upvotes: 1
Reputation: 18663
use a "?" after something to make it considered optional.
>>> r = r"/course/([a-zA-Z]+[-a-zA-Z]*)(/[A-Z[a-z]+[-a-zA-Z]*)?"
>>> s = "/course/lesson-one/chapter-one/"
>>> re.match(r, s).groups()
('lesson-one', '/chapter-one')
>>> s = "/course/computers/"
>>> re.match(r, s).groups()
('computers', None)
Upvotes: 1