user106745
user106745

Reputation: 193

Python path regex optional match

I have path strings like these two:

tree/bee.horse_2021/moose/loo.se
bee.horse_2021/moose/loo.se
bee.horse_2021/mo.ose/loo.se

The path can be arbitrarily long after moose. Sometimes the first part of the path such as tree/ is missing, sometimes not. I want to capture tree in the first group if it exists and bee.horse in the second.

I came up with this regex, but it doesn't work:

path_regex = r'^(?:(.*)/)?([a-zA-Z]+\.[a-zA-Z]+).+$'

What am I missing here?

Upvotes: 1

Views: 193

Answers (1)

The fourth bird
The fourth bird

Reputation: 163352

You can restrict the characters to be matched in the first capture group.

For example, you could match any character except / or . using a negated character class [^/\n.]+

^(?:([^/\n.]+)/)?([a-zA-Z]+\.[a-zA-Z]+).*$

Regex demo

Or you can restrict the characters to match word characters \w+ only

^(?:(\w+)/)?([a-zA-Z]+\.[a-zA-Z]+).*$

Regex demo

Note that in your pattern, the .+ at the end matches as least a single character. If you want to make that part optional, you can change it to .*

Upvotes: 1

Related Questions