Rumi P.
Rumi P.

Reputation: 1737

Regex: an optional substring that can appear in one of two places, but not both

I am validating a string with a regex, PCRE flavor. I have a substring that can optionally appear in one of two possible places - but not both. How do I write a regex for that?

The regex without the substring is

M[01]([ ]*\(?[A-Z]{3}\)?)?

The substring has the regex C[0-5] and can come either before or after the parentheses, or not be present at all. It can be separated by whitespace or not.

Valid examples (all including whitespace for legibility, but the same ones without whitespace are also valid):

M1
M1 C1
M1 (OSS)
M1 C1 (OSS)
M1 (OSS) C1

Invalid examples:

M1 C1 (OSS) C1

The closest thing I came up with is

M[01]([ ]*C[1-5]?)([ ]*\(?[A-Z]{3}\)?)?([ ]*C[1-5]?)

but this will also accept the invalid example. Since I only have two positions, I could of course enumerate the different combinations, but I dislike that solution because it does not scale well to more possible positions.

If that matters, this is a group that will be present in a longer string to be validated, so the regex will be embedded in a larger one as a subroutine.

Upvotes: 1

Views: 249

Answers (2)

The fourth bird
The fourth bird

Reputation: 163362

Using pcre, another option is to make use of a conditional to check for the existence of group 1 which has the form.

(?(1)foo|bar)

For the example data, you could make all 3 parts optional, where the first part is a capturing group. If there is no capturing group 1, then match the last part.

^M[01](\h*C[1-5])?(?:\h*\([A-Z]{3}\))?(?(1)|(?:\h*C[1-5])?)$

Explanation

  • ^ Start of string
  • M[01] Match M and either 0 or 1
  • ( Capture group 1
    • \h*C[1-5] match 0+ horizontal whitespace chars and C with digit 1-5
  • )? Close group 1 and make it optional
  • (?: Non capture group
    • \h*\([A-Z]{3}\) Match 0+ horizontal whitespace chars and A-Z 3 times between
  • )? Close group and make it optional
  • (? If clause
    • (1) Test if capture group 1 exists. If it does, do nothing
    • | Or
    • (?:\h*C[1-5])? Optionally match 0+ horizontal whitespace chars and C with digit 1-5
  • ) Close if clause
  • $ End of string

Regex demo

Note that in the pattern that you tried, matching the opening and closing parenthesis are optional \)? which could possibly also match M1 (OSS). Not sure if that is the intended match, but I have left that part out.

Upvotes: 0

CertainPerformance
CertainPerformance

Reputation: 370769

One option is, when the first C part is (possibly) matched, capture the C in a capture group. Then, at the second location of the possible C part, negative lookahead for the first capture group before matching it:

^M[01](?: *(C)[1-5])? *(?:\(?[A-Z]{3}\)?(?: *(?!\1)C[1-5])?)?$
           ^^^                                ^^^^^

https://regex101.com/r/xCxSn4/1

Note that if you want to match a plain space, you can just use a plain space in the pattern, no need for a character set: eg ([ ]) is equivalent to ( ).

Upvotes: 1

Related Questions