Justin
Justin

Reputation: 2049

RegExp Named-Groups & Conditional statements - Is there an "IF con1 OR con2 THEN (pattern)"?

I'm trying to write a semi-advanced RegExp pattern to parse out some "macros" in some text. The pattern uses Named Groups and Conditional Statements.

A basic example of using both of them together would be something like:

(?<test>a)?b(?(test)c|d)

The first part (before the b), is matching for the letter a, assigning it to the named-group test if it is successfully matched.

The second part (after the b), is the conditional statement, which basically reads:

If test was matched, then look for c, otherwise, look for d

My question is - Is it possible to have an OR in that condition at the end?

Here's an example pattern I wrote up to demonstrate what I'm trying to do. The pattern below looks for one of two named-groups, then has a conditional, matching for another character, if the first named-group was successfully matched:

(?:(?P<case1>a)?|(?P<case2>b)?)\|(?(case1)(?P<last>c)?)

And just to clarify what thats doing:

  1. Open up a non-capturing group, with two patterns:

    1.1. Match for the character a, assigning it to the named-group case1 if it is successfully matched

    1.2. Match for the character b, assigning it to the named-group case2 if it is successfully matched

  2. A conditional statement at the end, which reads:

If case1 was successfully matched, then match for the character c, assigning it to the named-group last if it is successfully matched

So, if you wanted to change it in such a way that step 2 instead would read:

If case1 OR case2 was successfully matched, then match for the character c, assigning it to the named-group last if it is successfully matched

I have tried all of the following:

(?:(?P<case1>a)?|(?P<case2>b)?)\|(?(case1|case2)(?P<last>c)?) 
(?:(?P<case1>a)?|(?P<case2>b)?)\|(?(?:(case1)|(case2))(?P<last>c)?)
(?:(?P<case1>a)?|(?P<case2>b)?)\|(?(case1,case2)(?P<last>c)?)
# Error (for 3 above): Invalid group structure, unmatched parenthesis

(?:(?P<case1>a)?|(?P<case2>b)?)\|(?:(?(case1)(?P<last>c)?)|(?(case2)(?P<last>c)?)) 
# Error: Subpattern name declared more than once

So I'm kinda lost as to what else to do. I created a Regex101.com instance with an example. You can see there's two lines in the Text String, and the pattern pulls out case1 and last from the first line, then just case2 from the 2nd line - The goal is to capture last in both lines

Thanks!

Upvotes: 2

Views: 1745

Answers (2)

user557597
user557597

Reputation:

edit Updated for case3
No workaround necessary..

(Note- Conditionals don't require workarounds, they work one way.
No kludging other parts of the code to use them. Learn how to use them is the best option
)


I think this is what you're trying to do
(?:(?P<case1>a)?|(?P<case2>b)?|(?P<case3>c)?)\|(?P<last>(?(case1)z?|(?(case2)z?)))

https://regex101.com/r/tH6pU0/6

Explained

 (?:
      (?P<case1> a )?               # (1), Optional a
   |  (?P<case2> b )?               # (2), Optional b
   |  (?P<case3> c )?               # (3), Optional c
 )
 \|                            # Required |

 (?P<last>                     # (4 start)
      (?(case1)                     # Did case1 match
           z?                            # yes, get optional z
        |                              # or
           (?(case2)                     # Did case2 match
                z?                            # yes, get optional z
           )
      )
 )                             # (4 end)

Upvotes: 1

Aran-Fey
Aran-Fey

Reputation: 43196

Regex doesn't have such a feature, no. But there are a few tricks/workarounds that can be used depending on the situation.

  • Workaround 1: If the two conditions are right next to each other, enclose them in another group: (?P<case1_or_2>(?P<case1>a)|(?P<case2>b))

  • Workaround 2: Duplicate the then-pattern and else-pattern: (?:(?(case1)c|d)|(?(case2)c|d))

  • Workaround 3: If possible, change your "condition"-groups to capture nothing (or, if that's not possible, simply add new groups for the sole purpose of capturing nothing. This workaround can be used in any scenario.), which allows you to construct an OR condition like so: (?:(?:(?P=case1)|(?P=case2))c|(?!(?P=case1))(?!(?P=case2))d)

Workaround 3 in more detail:

(?:
    (?P<case1>)a # if "a" is matched, case1 captures an empty string
|
    (?P<case2>)b # if "b" is matched, case2 captures an empty string
)? # if neither a nor b is matched, neither case matches at all
\|
(?: # if either case matched, match "c":
    (?: 
        (?P=case1) # match either case1
    |
        (?P=case2) # or case2
    )
    c # followed by "c"
| # if neither case matched, match "d":
    (?! # assert case1 didn't match
        (?P=case1)
    )
    (?! # assert case2 didn't match either
        (?P=case2)
    )
    d # match "d"
)

Upvotes: 2

Related Questions