ElektroStudios
ElektroStudios

Reputation: 20464

Fix RegEx to properly capture text inside parenthesis

SCENARIO


Time ago I asked a question to format music filenames in certain conditions:

However, I noticed too late that the accepted answer is wrong, because it can capture any word starting with "F". But this is not a problem/question, I solved it just by restoring the ft|feat|featuring OR group.

So finally from que question linked above, I ended up using this expression:

pattern := '^(.+)\s+-\s+(.+?)\s+(ft|feat|featuring)[\.\s]*([^([\])]+)(.+)?$' 
replace := '$1 Feat. $4 - $2$5' 

Well, now, having these filenames to test:

  1. Black Coast - Trndsttr
  2. Black Coast - Trndsttr (Feather)
  3. Black Coast - Trndsttr (Lucian Remix)
  4. Black Coast - Trndsttr (Feather) (Lucian Remix)
  5. Black Coast - Trndsttr Feat. M. Maggie
  6. Black Coast - Trndsttr (Feat. M. Maggie)
  7. Black Coast - Trndsttr Feat. M. Maggie (Lucian Remix)
  8. Black Coast - Trndsttr (Feat. M. Maggie) (Lucian Remix)
  9. Black Coast - Trndsttr (Lucian Remix) Feat. M. Maggie
  10. Black Coast - Trndsttr (Lucian Remix) (Feat. M. Maggie)
  11. Black Coast - Trndsttr (Feather) (Lucian Remix) Feat. M. Maggie
  12. Black Coast - Trndsttr (Feather) (Lucian Remix) (Feat. M. Maggie)
  13. Black Coast - Trndsttr (Feather) Feat. M. Maggie (Lucian Remix)
  14. Black Coast - Trndsttr (Feather) (Feat. M. Maggie) (Lucian Remix)
  15. Black Coast - Trndsttr (Feather) (Feat. M. Maggie) Lucian Remix
  16. Black Coast - Trndsttr (Feather) Feat. M. Maggie Lucian Remix

The expected results are these:

(from 1 to 4 no changes, and 16 is an assumable false positive, it is in essence the same as 5, 9 and 11.)

  1. Black Coast - Trndsttr
  2. Black Coast - Trndsttr (Feather)
  3. Black Coast - Trndsttr (Lucian Remix)
  4. Black Coast - Trndsttr (Feather) (Lucian Remix)
  5. Black Coast Feat. M. Maggie - Trndsttr
  6. Black Coast Feat. M. Maggie - Trndsttr
  7. Black Coast Feat. M. Maggie - Trndsttr (Lucian Remix)
  8. Black Coast Feat. M. Maggie - Trndsttr (Lucian Remix)
  9. Black Coast Feat. M. Maggie - Trndsttr (Lucian Remix)
  10. Black Coast Feat. M. Maggie - Trndsttr (Lucian Remix)
  11. Black Coast Feat. M. Maggie - Trndsttr (Feather) (Lucian Remix)
  12. Black Coast Feat. M. Maggie - Trndsttr (Feather) (Lucian Remix)
  13. Black Coast Feat. M. Maggie - Trndsttr (Feather) (Lucian Remix)
  14. Black Coast Feat. M. Maggie - Trndsttr (Feather) (Lucian Remix)
  15. Black Coast Feat. M. Maggie - Trndsttr (Feather) Lucian Remix
  16. Black Coast Feat. M. Maggie Lucian Remix - Trndsttr (Feather)

PROBLEM


The expression that I mentioned works perfect for all the filenames except for the cases where the Feat... part is grouped inside parenthesis (or brackets, whatever).

Then I tried to adapt the RegEx for the aggrupations condition:

pattern := '^(.+)\s+-\s+(.+?)\s+([\[\(\{])?\s*(ft|feat|featuring([\.])?\s+)((.+)[^\]\)\}])?\s*(.+)?$'

But I ended messing my head and missing things, because it also captures the first parenthesis enclosure and the following characters till the end.

I need some help with this.

Question


How I could fix/improve my expression to treat the mentioned filenames to get the expected results above?.

Or in other words, I need to maintain the "structure" of the expression but adding the capability to capture the Feat... part when it is inside parenthesis/brackets to properly format the filename.

PS: Please remember that I'm under pascal-script's RegEx syntax and their limitations (which I'm not sure about them).

IMPORTANT EDIT:

I discovered that the author of the software that has this limitations it has support to run an external app from its pascal-script editor, so I can launch a CLI app written in .Net to perform the regex replacement, then I'm now under C#/Vb.Net RegEx motor improvements, nice!.

Upvotes: 3

Views: 107

Answers (1)

Jan
Jan

Reputation: 43169

Something like:

^(?P<artist>.+?(?=\s-\s))          # artist with pos. lookahead
\s-\s                              # space - space
(?P<title>.+?(?=(?:\(?Feat\.)|$))  # title with pos. lookahead 
\(?                                # optional open parenthesis
    (?P<artist2>Feat\.[^()\n]+)?   # artist2 with Feat. before
\)?                                # optional closing parenthesis
(?P<subtitle>.+)?$                 # optional subtitle

See a demo on regex101.com.
Problem is that the dashes do not always match (maybe some additional programming logic?)

Upvotes: 2

Related Questions