chharvey
chharvey

Reputation: 9376

EBNF for capturing a comma between two optional values

I have two optional values, and when both are present, a comma needs to be in between them. If one or both values are present, there may be a trailing comma, but if no values are present, no comma is allowed.

Valid examples:

(first,second,)
(first,second)
(first,)
(first)
(second,)
(second)
()

Invalid examples:

(first,first,)
(first,first)
(second,second,)
(second,second)
(second,first,)
(second,first)
(,first,second,)
(,first,second)
(,first,)
(,first)
(,second,)
(,second)
(,)
(,first,first,)
(,first,first)
(,second,second,)
(,second,second)
(,second,first,)
(,second,first)

I have EBNF code (XML-flavored) that suffices, but is there a way I can simplify it? I would like to make it more readable / less repetitive.

tuple ::= "(" ( ( "first" | "second" | "first" "," "second" ) ","? )? ")"

If it’s easier to understand in regex, here’s the equivalent code, but I need a solution in EBNF.

/\(((first|second|first\,second)\,?)?\)/

And here’s a helpful railroad diagram: "(" ( ( "first" | "second" | "first" "," "second" ) ","? )? ")"

This question becomes even more complex when we abstract it to three terms: "first", "second", and "third" are all optional, but they must appear in that order, separated by commas, with an optional trailing comma. The best I can come up with is a brute-force method:

"(" (("first" | "second" | "third" | "first" "," "second" | "first" "," "third" | "second" "," "third" | "first" "," "second" "," "third") ","?)? ")"

"(" (("first" | "second" | "third" | "first" "," "second" | "first" "," "third" | "second" "," "third" | "first" "," "second" "," "third") ","?)? ")" Clearly, a solution involving O(2n) complexity is not very desirable.

Upvotes: 0

Views: 469

Answers (3)

chharvey
chharvey

Reputation: 9376

I found a way to simplify it, but not by much:

"(" ( ("first" ("," "second")? | "second") ","? )? ")"

"(" ( ("first" ("," "second")? | "second") ","? )? ")"

For the three-term solution, take the two-term solution and prepend a first term:

"(" (("first" ("," ("second" ("," "third")? | "third"))? | "second" ("," "third")? | "third") ","?)? ")"

"(" (("first" ("," ("second" ("," "third")? | "third"))? | "second" ("," "third")? | "third") ","?)? ")"

For any (n+1)-term solution, take the n-term solution and prepend a first term. This complexity is O(n), which is significantly better than O(2n).

Upvotes: 1

WJS
WJS

Reputation: 40047

I'm not familiar with EBNF but I am familiar with BNF and parser grammars. The following is just a variation of what you have based on my own regex. I am assuming the unquoted parens are not considered tokens and are used to group related elements.

  tuple ::= ( "(" ( "first,second" | "first" | "second" ) ","? ")" ) | "()"
  • It matches on either (first,second or (first or (second
  • Then it matches on an optional ,
  • Followed by a closing parens. )
  • or the empty parens grouping. ()

But I doubt this is an improvement.

Here is my Java test code. The first two lines of strings in the test data match. The others do not.

      String[] testdata = {
            "(first,second,)", "(first,second)", "(first,)", "(first)",
            "(second,)", "(second)", "()",

            "(first,first,)", "(first,first)", "(second,second,)",
            "(second,second)", "(second,first,)", "(second,first)",
            "(,first,second,)", "(,first,second)", "(,first,)", "(,first)",
            "(,second,)", "(,second)", "(,)", "(,first,first,)",
            "(,first,first)", "(,second,second,)", "(,second,second)",
            "(,second,first,)", "(,second,first)"
      };

      String reg = "\\(((first,second)|first|second),?\\)|\\(\\)";
      Pattern p = Pattern.compile(reg);

      for (String t : testdata) {
         Matcher m = p.matcher(t);
         if (m.matches()) {
            System.out.println(t);
         }
      }

Upvotes: 0

Emma
Emma

Reputation: 27743

This expression might help you to maybe design a better expression. You can do this with only using capturing groups and swipe from left to right and pass your possible inputs, maybe similar to this:

\((first|second|)(,|)(second|)([\)|,]+)

I'm just guessing that you wish to capture the middle comma:

enter image description here

This may not be the exact expression you want. However, it might show you how this might be done in a simple way:

^(?!\(,)\((first|)(,|)(second|)([\)|,]+)$

enter image description here

You can add more boundaries to the left and right of your expression, maybe similar to this expression:

enter image description here

This graph shows how the second expression would work:

enter image description here

Performance

This JavaScript snippet shows the performance of the second expression using a simple 1-million times for loop, and how it captures first and second using $1 and $3.

repeat = 1000000;
start = Date.now();

for (var i = repeat; i >= 0; i--) {
	var string = "(first,second,)";
	var regex = /^(?!\(,)\((first|second|)(,|)(second|)([\)|,]+)$/gms;
	var match = string.replace(regex, "$1 and $3");
}

end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");

Upvotes: 0

Related Questions