KarlP
KarlP

Reputation: 379

regex optional word

I am trying to find a regex that will match each of the following cases from a set of ldap objectclass definitions - they're just strings really.

The variations in the syntax are tripping my regex up and I don't seem to be able to find a balance between the greedy nature of the match and the optional word "MAY".

( class1-OID NAME 'class1' SUP top STRUCTURAL MUST description MAY ( brand $ details $ role ) )

DESIRED OUTPUT: description
ACTUAL GROUP1: description
ACTUAL GROUP1 with ? on the MAY group: description MAY

( class2-OID NAME 'class2' SUP top STRUCTURAL MUST groupname MAY description )

DESIRED OUTPUT: groupname
ACTUAL GROUP1: groupname
ACTUAL GROUP1 with ? on the MAY group: groupname MAY description

( class3-OID NAME 'class3' SUP top STRUCTURAL MUST ( code $ name ) )

DESIRED OUTPUT: code $ name
ACTUAL GROUP1: no match
ACTUAL GROUP1 with ? on the MAY group: code $ name

( class4-OID NAME 'class4' SUP top STRUCTURAL MUST ( code $ name ) MAY ( group $ description ) )

DESIRED OUTPUT: code $ name
ACTUAL GROUP1: code $ name
ACTUAL GROUP1 with ? on the MAY group: code $ name

Using this:

MUST \(?([\w\$\-\s]+)\)?\s*(?:MAY) (Regex101)

matches lines 1, 2 and 4, but doesn't match the 3rd one with no MAY statement. Adding an optional "?" to the MAY group results in a good match for 3 and 4, but then the 1st and 2nd lines act greedily and run on into MAY (line 1) or the remainder of the string (line 2).

It feels like I need the regex to consider MAY as optional but also that if MAY is found it should stop - I don't seem to be able to find that balance.

Upvotes: 5

Views: 6287

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

If you can use a regex with two capturing groups you may use

MUST\s+(?:\(([^()]+)\)|(\S+))\s*(?:MAY)?

See the regex demo

Details

  • MUST - a word MUST
  • \s+ - 1+ whitespaces
  • (?:\(([^()]+)\)|(\S+)) - two alternatives:
    • \( - (
    • ([^()]+) - Group 1: 1+ chars other than ( and )
    • \) - a ) char
    • | - or
    • (\S+) - Group 2: one or more non-whitespace chars
  • \s+ - 1+ whitespaces
  • (?:MAY)? - an optional word MAY

Upvotes: 8

Related Questions