Reputation: 55
I've tried to write simple parser with jison (http://zaa.ch/jison/docs/) at stuck in describing text.
%lex
%%
[\s\n\t]+ return 'TK_SPACE';
[0-9]+("."[0-9]+)?\b return 'TK_NUMBER';
[a-zA-Z]+([a-zA-Z0-9]+)?\b return 'TK_WORD';
<<EOF>> return 'EOF';
/lex
%start document
%%
document
: nodes EOF
{ console.log($1); }
| EOF
;
nodes
: nodes node
{ $1.push($2); $$ = $1; }
| node
{ $$ = [$1]; }
;
node
: text
;
text
: text text_element
{ $$ = $1 + $2; }
| text_element
;
text_element
: TK_NUMBER
| TK_WORD
| TK_SPACE
;
This grammar compiled with warnings.
Conflict in grammar: multiple actions possible when lookahead token is TK_SPACE in state 5
- reduce by rule: node -> text
- shift token (then go to state 9)
Conflict in grammar: multiple actions possible when lookahead token is TK_WORD in state 5
- reduce by rule: node -> text
- shift token (then go to state 8)
Conflict in grammar: multiple actions possible when lookahead token is TK_NUMBER in state 5
- reduce by rule: node -> text
- shift token (then go to state 7)
States with conflicts:
State 5
node -> text . #lookaheads= TK_SPACE TK_WORD TK_NUMBER EOF
text -> text .text_element #lookaheads= EOF TK_NUMBER TK_WORD TK_SPACE
text_element -> .TK_NUMBER
text_element -> .TK_WORD
text_element -> .TK_SPACE
But if i try to parse text it works fine. This is not a full version of code, just version with text. I want to append nodes at node
in feature.
Upvotes: 1
Views: 920
Reputation: 126358
The problem is your grammar is ambiguous -- a nodes
consists of a sequence of one or mode node
with no separators. A node
is a text
which consists of one or more text_element
again with no separators. So there's no way to tell when one node
ends and the next on begins.
As an example, if you have a sequence of 3 text_elements
in you input, it might be a single node
with all 3, or it might be 3 node
each with one.
Bison will "resolve" this conflict by always preferring shift over reduce, which will always prefer making larger text
objects so the rule nodes: nodes node
will never be reduced, and might as well just be removed from the grammar. Since this is a pure ambiguity (not a lookahead problem) the resulting grammar matches the same language, so that may not be a problem. I assume jison (or whatever parser generator you are actually using) is the same.
In general, however, conflicts are a problem because it means that the grammar parsed by the generated parser is not the grammar you specififed. Figuring out what grammar is actually parsed by the resulting parser is non-trivial and requires careful understanding of how shoft-reduce parsing works and the states actually generarated by the parser generator. The information is all there in the .output
file (produced by bison with -v
-- other generators may be different), but you need to read and understand it.
Upvotes: 4