FlexMcMurphy
FlexMcMurphy

Reputation: 495

Conditional If Then Else Regex statement

This question follows on from a previous question about If-Then-Else Regular Expressions.

Because of how I phrased my problem in the other question solutions didn't use the (?(A)X|Y) syntax. But I think I need to use that approach.

Here is my problem re-phrased...

I need a regex that takes as input a string representing a filename.

Here are my test strings...

The Edge Of Seventeen 2016 720p.mp4
20180511 2314 - Film4 - Northern Soul.ts
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts

If the filename matches this regex:

\d{8} \d{4} -.*?- .*?\.ts

Then this RegEx should be applied:

\d{8} \d{4} -.*?- ?(.*)\.ts

If the filename does not match that first regex then this regex should be applied to it:

(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?

This is the expected output...

Test string: The Edge Of Seventeen 2016 720p.mp4
Expected output: "The Edge Of Seventeen 2016 " (quotes only included to show that a trailing space can be left at the end)

Test String: 20180511 2314 - Film4 - Northern Soul.ts
Expected output: Northern Soul

Test String: 20150526 2059 - BBC Four - We Need to Talk About Kevin.ts
Expected output: We Need to Talk About Kevin

Here is what I have tried to make the If-Then-Else Regex but it doesn't work:

I use this format --> (?(A)X|Y)

(?(\d{8} \d{4} -.*?- .*?\.ts)\d{8} \d{4} -.*?- ?(.*)\.ts|(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?)

This is A

\d{8} \d{4} -.*?- .*?\.ts

This is X

\d{8} \d{4} -.*?- ?(.*)\.ts

This is Y

(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?

I have tested the A, X and Y Regexes and they work individually but not when I put them together. Can someone help to piece them together using PCRE standard?

Cheers,

Flex

Upvotes: 1

Views: 336

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

You may use

^\d{8} \d{4} -.*?- ?\K.*(?=\.ts$)|^.*[^][ _,.()-][][ _.()-]+(?:19|20)\d{2}(?!\d)

See the regex demo

The pattern is a combination of two alternatives and as in any NFA regex the first alternative that matches "wins" and regex engine stops analyzing the the rest of alternatives on that level:

  • ^\d{8} \d{4} -.*?- ?\K.*(?=\.ts$) - matches
    • ^ - start of string
    • \d{8} \d{4} - - 8 digits, space, four digits, space and then -
    • .*? - 0+ chars other than line breaks as few as possible
    • - ? - - and an optional space
    • \K - match reset operator that discards the text matched so far in the memory buffer
    • .* - any 0+ chars other than line break chars, as many as possible
    • (?=\.ts$) - this positive lookahead requires .ts and end of string position immediately to the right of the current position.
  • | - or, if the above alternative does not match, try
    • ^ - start of a string
    • .* - any 0+ chars other than line break chars, as many as possible
    • [^][ _,.()-] - a char other than ], [, space, _, ., (, ) and - chars
    • [][ _.()-]+ - 1+ ], [, space, _, ., (, ) and - chars
    • (?:19|20) - 19 or 20 substring
    • \d{2}(?!\d) - two digits, not followed with another digit.

Upvotes: 1

Sree Kumar
Sree Kumar

Reputation: 2245

Continuing from my answer here -> How to make an If Then Else Regex conditional statement, the same method is applicable still. I have tested it on the Java engine.

One difference that will help you is to name the groups that you are interested in the values of. Eg, I have rewritten the regexes below with named groups (small letter) x and y. Once the engine has completed the parsing, you can check for the value of match group x and then for group y, if there is nothing for group x.

Regex X: \d{8} \d{4} -.*?- ?(?<x>.*)\.ts

Regex Y: (?<y>(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9]))([ _\,\.\(\)\[\]\-]|[^0-9]$)?

You will have to choose the right group for y as I don't think I have done that part correctly.

Upvotes: 1

Related Questions