Reputation: 97
I have the following string that would be part of a file name. [Cast1, Cast2, Cast 3], this string is comma delimited. It would be at the end of a film title and be preceded with either a - or ~
The filename would look like this
(Studio) - Title (Year) ~ [Cast1, Cast2, Cast 3] the section in bold could be optional
I need a REGEX to get the following, I know this can be done with string splitting but I need it in REGEX
I would like this to be in a named group, so far I have ((?P<CAST>([^,]+)))
But it includes the opening bracket and closing bracket.
On top of this
Upvotes: 0
Views: 1459
Reputation: 44258
If I understand what you are looking for, try:
[-~]\s*\[(?P<CAST>[^\]]*)\]
[-~]
Matches '-' or '~'.\s*
Matches zero or more whitespace characters.\[
Matches '['.(?P<CAST>[^\]]*)
Matches 0 or more characters that are not ']' and captures them in named capture group CAST.\]
Matches ']'.So the above will capture whatever is between the '[' and ']' characters following a '-' or '~' whether those characters contain commas or not. You cannot have 3 capture groups identically named CAST. If you want the individual components of the cast, you will have to do it with string splitting:
import re
s = '(Studio) - Title (Year) ~ [Cast1, Cast2, Cast 3]'
m = re.search(r'[-~]\s*\[(?P<CAST>[^\]]*)\]', s)
if m:
cast = m.group('CAST')
print re.split(r',\s*', cast)
Prints:
['Cast1', 'Cast2', 'Cast 3']
If you were running Python 3, you could install the regex module from the PyPi repository, which has far more capabilities then the builtin re module, and then you could execute:
import regex
s = '(Studio) - Title (Year) ~ [Cast1, Cast2, Cast 3]'
for m in regex.finditer(r'(?:[-~]\s*\[|\G(?!\A))\K\s*(?P<CAST>[^,\]]*)(?:[,\]])', s):
print(m['CAST'])
Prints:
Cast1
Cast2
Cast 3
But what does that buy you?
Upvotes: 1