Reputation: 11968
I'm working on a grammar for a templating system. I've hit a snag in the build and I can't quite figure out how to solve this issue. I've simplified down the test case to best emphasize exactly what I'm doing.
Example Strings:
(foo)
- works(foo())
- fails Expecting 'parenEnd', got 'parenInterior'
foo (foo) bar
foo (foo(function() { console.log('stuff'); })) bar
foo (foo.bar.baz("stuff")) bar
The rules are that within a parenthetical, anything goes, any characters. I don't need to validate, and I don't need to ensure they match a proper format. On the other hand, from my understanding, in order for the parser to function I do need keep track of opening and closing (
and )
otherwise the lexer can't know where one parenthetical statement begins and another ends, such as (foo()) (bar)
. In order to keep track of that I'm using a paren
start condition which increments a value whenever a paren is hit inside a paren statement, and removes it when a close paren is it.
The problem is it's just not working. The main culprit is it never appears to hit my <paren>")"
rule and yet I'm hitting the <paren>"("
rule just fine. They appear syntactically the same, why is one working and the other not?
Grammar
%lex
%x paren
%%
\s+ /* skip whitespace */
<INITIAL>"(" { this.begin("paren"); parenCount = 1; return "parenStart"; };
<paren>"(" { console.log("parenStart", parenCount); parenCount++; return "parenInterior"; };
<paren>")" { console.log("parenEnd", parenCount); parenCount--; if (parenCount === 0) { this.popState(); return "parenEnd"; } else { return "parenInterior"; } };
<paren>[^\)\(]+ { console.log(this); return "parenInterior"; };
<<EOF>> return 'EOF';
. return 'INVALID';
/lex
%start expressions
%% /* language grammar */
expressions
: parenStart parenInterior parenEnd { return $1 + $2 + $3; }
;
%%
parenCount = 0;
Upvotes: 0
Views: 282
Reputation: 151441
I believe your problem is that your grammar is not accepting a sequence of tokens. If I change you grammar to this, then I get something that can handle the strings you've shown in your question:
%lex
%x paren
%%
\s+ /* skip whitespace */
<INITIAL>"(" { this.begin("paren"); parenCount = 1; return "parenStart"; };
<paren>"(" { console.log("parenStart", parenCount); parenCount++; return "parenInterior"; };
<paren>")" { console.log("parenEnd", parenCount); parenCount--; if (parenCount === 0) { this.popState(); return "parenEnd"; } else { return "parenInterior"; } };
<paren>[^\)\(]+ { console.log(this); return "parenInterior"; };
<<EOF>> return 'EOF';
. return 'WHATEVER';
/lex
%start expressions
%% /* language grammar */
expressions
: whateverSeq parenStart parenInteriorSeq parenEnd whateverSeq EOF { return $1 + $2 + $3 + $4 + $5; }
;
parenInteriorSeq
: parenInterior
| parenInteriorSeq parenInterior -> $1.concat($2)
;
whateverSeq
: -> "" // Empty sequence.
| whatevers // One or more WHATEVER tokens.
;
whatevers
: whatever
| whateverSeq WHATEVER -> $1.concat($2)
;
%%
parenCount = 0;
Then there's no problem with nesting parentheses.
Salient changes:
Replaced INVALID
with WHATEVER
. Added the rules to have a sequence of WHATEVER
tokens at the start and end. This allows to have things like blah (foo) blah
.
Replaced parenInterior
with parenInteriorSeq
so that you can have sequence of parenInterior
tokens inside parentheses. This is necessary because in a string like (foo())
, foo
is one token, the next (
is another token and the next )
is another token. So you have to accept a list of tokens.
Upvotes: 1