Reputation: 9376
I have two optional values, and when both are present, a comma needs to be in between them. If one or both values are present, there may be a trailing comma, but if no values are present, no comma is allowed.
Valid examples:
(first,second,)
(first,second)
(first,)
(first)
(second,)
(second)
()
Invalid examples:
(first,first,)
(first,first)
(second,second,)
(second,second)
(second,first,)
(second,first)
(,first,second,)
(,first,second)
(,first,)
(,first)
(,second,)
(,second)
(,)
(,first,first,)
(,first,first)
(,second,second,)
(,second,second)
(,second,first,)
(,second,first)
I have EBNF code (XML-flavored) that suffices, but is there a way I can simplify it? I would like to make it more readable / less repetitive.
tuple ::= "(" ( ( "first" | "second" | "first" "," "second" ) ","? )? ")"
If it’s easier to understand in regex, here’s the equivalent code, but I need a solution in EBNF.
/\(((first|second|first\,second)\,?)?\)/
And here’s a helpful railroad diagram:
This question becomes even more complex when we abstract it to three terms: "first"
, "second"
, and "third"
are all optional, but they must appear in that order, separated by commas, with an optional trailing comma. The best I can come up with is a brute-force method:
"(" (("first" | "second" | "third" | "first" "," "second" | "first" "," "third" | "second" "," "third" | "first" "," "second" "," "third") ","?)? ")"
Clearly, a solution involving O(2n) complexity is not very desirable.
Upvotes: 0
Views: 469
Reputation: 9376
I found a way to simplify it, but not by much:
"(" ( ("first" ("," "second")? | "second") ","? )? ")"
For the three-term solution, take the two-term solution and prepend a first term:
"(" (("first" ("," ("second" ("," "third")? | "third"))? | "second" ("," "third")? | "third") ","?)? ")"
For any (n+1)-term solution, take the n-term solution and prepend a first term. This complexity is O(n), which is significantly better than O(2n).
Upvotes: 1
Reputation: 40047
I'm not familiar with EBNF but I am familiar with BNF and parser grammars. The following is just a variation of what you have based on my own regex. I am assuming the unquoted parens are not considered tokens and are used to group related elements.
tuple ::= ( "(" ( "first,second" | "first" | "second" ) ","? ")" ) | "()"
(first,second
or (first
or (second
,
)
()
But I doubt this is an improvement.
Here is my Java test code. The first two lines of strings in the test data match. The others do not.
String[] testdata = {
"(first,second,)", "(first,second)", "(first,)", "(first)",
"(second,)", "(second)", "()",
"(first,first,)", "(first,first)", "(second,second,)",
"(second,second)", "(second,first,)", "(second,first)",
"(,first,second,)", "(,first,second)", "(,first,)", "(,first)",
"(,second,)", "(,second)", "(,)", "(,first,first,)",
"(,first,first)", "(,second,second,)", "(,second,second)",
"(,second,first,)", "(,second,first)"
};
String reg = "\\(((first,second)|first|second),?\\)|\\(\\)";
Pattern p = Pattern.compile(reg);
for (String t : testdata) {
Matcher m = p.matcher(t);
if (m.matches()) {
System.out.println(t);
}
}
Upvotes: 0
Reputation: 27743
This expression might help you to maybe design a better expression. You can do this with only using capturing groups and swipe from left to right and pass your possible inputs, maybe similar to this:
\((first|second|)(,|)(second|)([\)|,]+)
I'm just guessing that you wish to capture the middle comma:
This may not be the exact expression you want. However, it might show you how this might be done in a simple way:
^(?!\(,)\((first|)(,|)(second|)([\)|,]+)$
You can add more boundaries to the left and right of your expression, maybe similar to this expression:
This graph shows how the second expression would work:
This JavaScript snippet shows the performance of the second expression using a simple 1-million times for
loop, and how it captures first
and second
using $1
and $3
.
repeat = 1000000;
start = Date.now();
for (var i = repeat; i >= 0; i--) {
var string = "(first,second,)";
var regex = /^(?!\(,)\((first|second|)(,|)(second|)([\)|,]+)$/gms;
var match = string.replace(regex, "$1 and $3");
}
end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");
Upvotes: 0