Reputation: 18861
I'm building a validation rule parser.
Here's a regex that is meant to verify if a rule is complete and valid:
/^\w+ \| (?: (?:\w+ | \w+=\{(?:[\w.]+,?)+\} | \w+=[\w.]+) (?:,|$) )+$/ix
What I want to match is a string of this format:
identifier | options
Where options
is a comma-separated list of:
flag
(sequence of \w
)key=value
list={1,2,45,foo_bar,with.dot}
It works, but has some issues with trailing commas:
This should not match (trailing comma at the end):
potato|lemon,aa=bb,list={12,45,a.b,_s},foo,yes=no,
And trailing comma at the end of a list:
potato|lemon,aa=bb,list={12,45,a.b,_s,},foo,yes=no
Suggestions welcome - feel free to tinker.
Upvotes: 2
Views: 3113
Reputation: 2632
There's better way to write this kind of regexp in a grammar-like way:
/
(?(DEFINE)
(?<entry> (?&identifier) \| (?&options) )
(?<identifier> \w+ )
(?<options> (?&option) , (?&options) | (?&option) )
(?<option> (?&list) | (?&keyvalue) | (?&flag) )
(?<flag> \w+ )
(?<keyvalue> (?&key) = (?&value) )
(?<key> \w+ )
(?<value> \w+ )
(?<list> list= \{ (?&list_entries) \} )
(?<list_entries> (?&list_entry) , (?&list_entries) | (?&list_entry) )
(?<list_entry> [A-Za-z0-9._]+ )
)
^(?&entry)$
/x
See it in action: https://regex101.com/r/hF3pO2/10
It's easier to understand and tweak, however there are certain quirks. One is that left recursion doesn't work (you cannot use options: options "," option | option
rule, you have to specify it as options: option "," options | option
). Another is that you can only use it for validation, you cannot extract data with this kind of pattern, as rules in (?(DEFINE))
block do not capture.
Based on Nikita Popov's article on how powerful (and not really regular) PCRE regexps are: https://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.html
Upvotes: 0
Reputation: 18861
I'll add my own solution here as well, just for completeness:
^ \w+ \| (?: (?: \w++ | \w++=\{(?:[\w.]++ (?:,(?!\}))?)+\} | \w++=[\w.]++ ) (?:,(?!$))? )+ $
I'm not sure if it's correct, but I tried it and it worked, so hopefully. (try it)
Upvotes: 1
Reputation: 89547
If you only need to check the syntax, you can use :
\A\w+\|(\w++(?:=(?:{[\w.]++(?>,[\w.]+)*}|[\w.]+))?)(?:,(?1))*\z
see the demo (in multiline mode)
Upvotes: 3
Reputation: 1303
Updated solution: Since you don't need specific group matching, I just go ahead and check after a comma if there is another character. This implies every matching alternative must start with \w
(I matched in the demo just for visual understanding)
So (?:,\w)
for the list={X,Y,Z}
and (,\w)?
for the end of string.
Check it out: http://regex101.com/r/hF3pO2/7
^ \w+ \| (?: (?: \w+ | \w+=\{ (?:[\w.]+ (?:,\w)?)+ \} | \w+=[\w.]+ ) (,\w)? )+$
With gixm
flags
Old (close but not quite) solution:
I am not a super pro in regexes, so maybe there is a better way, but I could verify the end of the string is not a ,$
with this:
(?:,[^$]|[^,]$)
So I added that for the list={X,Y,Z}
and for the end of the line. The whole regex now looks like this:
^ \w+ \| (?: (?: \w+ | \w+=\{ (?:[\w.]+ (?:,[^$]|[^,]$)?)+ \} | \w+=[\w.]+ )(?:,[^$]|[^,]$) )+$
... Have a look http://regex101.com/r/hF3pO2/3
Upvotes: 4