Reputation: 379
I am trying to find a regex that will match each of the following cases from a set of ldap objectclass
definitions - they're just strings really.
The variations in the syntax are tripping my regex up and I don't seem to be able to find a balance between the greedy nature of the match and the optional word "MAY".
( class1-OID NAME 'class1' SUP top STRUCTURAL MUST description MAY ( brand $ details $ role ) )
DESIRED OUTPUT: description
ACTUAL GROUP1: description
ACTUAL GROUP1 with ? on the MAY group: description MAY
( class2-OID NAME 'class2' SUP top STRUCTURAL MUST groupname MAY description )
DESIRED OUTPUT: groupname
ACTUAL GROUP1: groupname
ACTUAL GROUP1 with ? on the MAY group: groupname MAY description
( class3-OID NAME 'class3' SUP top STRUCTURAL MUST ( code $ name ) )
DESIRED OUTPUT: code $ name
ACTUAL GROUP1: no match
ACTUAL GROUP1 with ? on the MAY group: code $ name
( class4-OID NAME 'class4' SUP top STRUCTURAL MUST ( code $ name ) MAY ( group $ description ) )
DESIRED OUTPUT: code $ name
ACTUAL GROUP1: code $ name
ACTUAL GROUP1 with ? on the MAY group: code $ name
Using this:
MUST \(?([\w\$\-\s]+)\)?\s*(?:MAY)
(Regex101)
matches lines 1, 2 and 4, but doesn't match the 3rd one with no MAY
statement.
Adding an optional "?" to the MAY
group results in a good match for 3 and 4, but then the 1st and 2nd lines act greedily and run on into MAY
(line 1) or the remainder of the string (line 2).
It feels like I need the regex to consider MAY
as optional but also that if MAY
is found it should stop - I don't seem to be able to find that balance.
Upvotes: 5
Views: 6287
Reputation: 626738
If you can use a regex with two capturing groups you may use
MUST\s+(?:\(([^()]+)\)|(\S+))\s*(?:MAY)?
See the regex demo
Details
MUST
- a word MUST
\s+
- 1+ whitespaces(?:\(([^()]+)\)|(\S+))
- two alternatives:
\(
- (
([^()]+)
- Group 1: 1+ chars other than (
and )
\)
- a )
char|
- or (\S+)
- Group 2: one or more non-whitespace chars\s+
- 1+ whitespaces(?:MAY)?
- an optional word MAY
Upvotes: 8