Reputation: 193
I have path strings like these two:
tree/bee.horse_2021/moose/loo.se
bee.horse_2021/moose/loo.se
bee.horse_2021/mo.ose/loo.se
The path can be arbitrarily long after moose
. Sometimes the first part of the path such as tree/
is missing, sometimes not. I want to capture tree
in the first group if it exists and bee.horse
in the second.
I came up with this regex, but it doesn't work:
path_regex = r'^(?:(.*)/)?([a-zA-Z]+\.[a-zA-Z]+).+$'
What am I missing here?
Upvotes: 1
Views: 193
Reputation: 163352
You can restrict the characters to be matched in the first capture group.
For example, you could match any character except /
or .
using a negated character class [^/\n.]+
^(?:([^/\n.]+)/)?([a-zA-Z]+\.[a-zA-Z]+).*$
Or you can restrict the characters to match word characters \w+
only
^(?:(\w+)/)?([a-zA-Z]+\.[a-zA-Z]+).*$
Note that in your pattern, the .+
at the end matches as least a single character. If you want to make that part optional, you can change it to .*
Upvotes: 1