Capsup
Capsup

Reputation: 87

Matching different digits after a lookahead

Say I have the following:

Regex: (?<=([sS][\d]{1,2}[eE]))(?(?=0)[1-9]{1}|[\d]{2}) Text: FooBar.S02E03.foo

how do I get the first part of the if to match only the '3' of the E03?

What I'm trying to get here is a regex that matches only the second digit after the 'E' if the first digit after the 'E' is 0, otherwise match both digits after the 'E'.

Upvotes: 0

Views: 114

Answers (4)

HMK
HMK

Reputation: 574

@Time Pietzcker 's solution is good enough but does not work for the regex engine without the support of Lookbehind with quantifiers. The below provides a solution with the simple regex engine, and it uses non-capturing group and \K to bypass the limitations.

(?<=[sS])(?:\d{1,2}[eE](0)?)\K(?(1)[1-9]|[1-9]\d)

Upvotes: 0

beans
beans

Reputation: 116

Taking what you had originally, and adding to it, you could have:

(?<=([sS]))([\d]{1,2}[eE])(?(?=0)[0][1-9]{1}|[\d]{2})

I've taken the \d out of the lookbehind as it's not possible to check if 1 or 2 digits. Then for the second section, in the if statement I've added the zero before the [1-9], to complete the regex

I've tested this with the inputs:

S02E03 S02E14

on http://regex101.com/, it's a very handy tool for checking out the regex before using it

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336188

The following works - is that what you're looking for? This will always stop the match after the second digit (if that's not necessary, it's unclear why you would need such a complicated regex in the first place).

(?<=[sS]\d{1,2}[eE](0)?)(?(1)[1-9]|[1-9]\d)

Explanation:

(?<=             # Lookbehind:
 [sS]\d{1,2}[eE] # SddE
 (0)?            # Match a zero if present
)                # End of lookbehind
(?(1)            # If the zero matched...
 [1-9]           # match a single digit (1-9)
|                # If it didn't match...
 [1-9]\d         # match two digits (1-9 and another one)
)                # End of conditional

Upvotes: 2

tripflag
tripflag

Reputation: 245

You can skip all of the complexity of if-checks. Simply eat any leading zeros after S or E like this:

echo FooBar.S02E03.foo | sed -r 's/(.*)\.S0*([0-9]{1,2})E0*([0-9]{1,2})\.(.*)/title:\1, season:\2, episode:\3, postfix:\4/'

title:FooBar, season:2, episode:3, postfix:foo

Upvotes: 0

Related Questions