Reputation: 4771
I have the following regular expression in Python:
m = re.match(r'(?P<name>[a-zA-Z0-9]+)(?P<limit>_\d+K)(?P<code>_[a-zA-Z0-9]+)?(_[a-zA-Z0-9]+)?(_[a-zA-Z0-9]+)?(_[a-zA-Z0-9]+)?(_[a-zA-Z0-9]+)?.(?P<file_format>\w+)?',
file_name, re.M | re.I)
It is used to validate a file name against a specific pattern and I also take the values of the groups from the file name like so:
name = m.group('name')
This works fine if the file name is exactly in the given format like so:
file_name = "name_122K_someCode_someotherthing.jpg"
However in some cases the or parts may be missing from the file name. In those cases, obviously, the file name won't match the regexp and I won't be able to get the rest of the values from it.
How can I make the and optional groups so the pattern matches even if any of those or both miss from the file name?
Upvotes: 0
Views: 88
Reputation: 163642
If some of the parts can be missing, you could make all the groups optional except the file format if that should be present.
As there are no spaces matched, you could prepend the pattern with a positive lookahead ^(?=\S+\.\w+$)
to assert at least a single non whitespace char bofore matching the file_format
group.
Note to escape the dot \.
to match it literally
^(?=\S+\.\w+$)(?P<name>[a-zA-Z0-9]+)?(?P<limit>_\d+K)?(?P<code>_[a-zA-Z0-9]+)?(_[a-zA-Z0-9]+)?(_[a-zA-Z0-9]+)?(_[a-zA-Z0-9]+)?(_[a-zA-Z0-9]+)?\.(?P<file_format>\w+)$
If all the parts can be optional, you could assert a word character and make the whole part at the end of the string optional, including the dot.
^(?=\w)(?P<name>[a-zA-Z0-9]+)?(?P<limit>_\d+K)?(?P<code>_[a-zA-Z0-9]+)?(_[a-zA-Z0-9]+)?(_[a-zA-Z0-9]+)?(_[a-zA-Z0-9]+)?(_[a-zA-Z0-9]+)?(?:\.(?P<file_format>\w+))?$
Edit As suggested by @JvdV you could shorten the pattern a bit by using a quantifier to capture all the values of the optional capturing groups into a single group:
^(?=\w)(?P<name>[a-zA-Z0-9]+)?(?P<limit>_\d+K)?(?P<code>_[a-zA-Z0-9]+)?((?:_[a-zA-Z0-9]+){0,4})(?:\.(?P<file_format>\w+))?$
Upvotes: 2