Reputation: 495
This question follows on from a previous question about If-Then-Else Regular Expressions.
Because of how I phrased my problem in the other question solutions didn't use the (?(A)X|Y) syntax. But I think I need to use that approach.
Here is my problem re-phrased...
I need a regex that takes as input a string representing a filename.
Here are my test strings...
The Edge Of Seventeen 2016 720p.mp4
20180511 2314 - Film4 - Northern Soul.ts
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts
If the filename matches this regex:
\d{8} \d{4} -.*?- .*?\.ts
Then this RegEx should be applied:
\d{8} \d{4} -.*?- ?(.*)\.ts
If the filename does not match that first regex then this regex should be applied to it:
(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?
This is the expected output...
Test string:
The Edge Of Seventeen 2016 720p.mp4
Expected output:
"The Edge Of Seventeen 2016 " (quotes only included to show that a trailing space can be left at the end)
Test String:
20180511 2314 - Film4 - Northern Soul.ts
Expected output:
Northern Soul
Test String:
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts
Expected output:
We Need to Talk About Kevin
Here is what I have tried to make the If-Then-Else Regex but it doesn't work:
I use this format --> (?(A)X|Y)
(?(\d{8} \d{4} -.*?- .*?\.ts)\d{8} \d{4} -.*?- ?(.*)\.ts|(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?)
This is A
\d{8} \d{4} -.*?- .*?\.ts
This is X
\d{8} \d{4} -.*?- ?(.*)\.ts
This is Y
(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?
I have tested the A, X and Y Regexes and they work individually but not when I put them together. Can someone help to piece them together using PCRE standard?
Cheers,
Flex
Upvotes: 1
Views: 336
Reputation: 626690
You may use
^\d{8} \d{4} -.*?- ?\K.*(?=\.ts$)|^.*[^][ _,.()-][][ _.()-]+(?:19|20)\d{2}(?!\d)
See the regex demo
The pattern is a combination of two alternatives and as in any NFA regex the first alternative that matches "wins" and regex engine stops analyzing the the rest of alternatives on that level:
^\d{8} \d{4} -.*?- ?\K.*(?=\.ts$)
- matches
^
- start of string\d{8} \d{4} -
- 8 digits, space, four digits, space and then -
.*?
- 0+ chars other than line breaks as few as possible- ?
- -
and an optional space\K
- match reset operator that discards the text matched so far in the memory buffer.*
- any 0+ chars other than line break chars, as many as possible(?=\.ts$)
- this positive lookahead requires .ts
and end of string position immediately to the right of the current position.|
- or, if the above alternative does not match, try
^
- start of a string.*
- any 0+ chars other than line break chars, as many as possible[^][ _,.()-]
- a char other than ]
, [
, space, _
, .
, (
, )
and -
chars[][ _.()-]+
- 1+ ]
, [
, space, _
, .
, (
, )
and -
chars(?:19|20)
- 19
or 20
substring\d{2}(?!\d)
- two digits, not followed with another digit.Upvotes: 1
Reputation: 2245
Continuing from my answer here -> How to make an If Then Else Regex conditional statement, the same method is applicable still. I have tested it on the Java engine.
One difference that will help you is to name the groups that you are interested in the values of. Eg, I have rewritten the regexes below with named groups (small letter) x
and y
. Once the engine has completed the parsing, you can check for the value of match group x
and then for group y
, if there is nothing for group x
.
Regex X: \d{8} \d{4} -.*?- ?(?<x>.*)\.ts
Regex Y: (?<y>(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9]))([ _\,\.\(\)\[\]\-]|[^0-9]$)?
You will have to choose the right group for y
as I don't think I have done that part correctly.
Upvotes: 1