Reputation: 150
I am writing a parser using Bison, but can't seem to get the grammar correct.
Here are some of the rules used around conflict one:
program : function END_OF_FILE {return 0;}
formal_parameters : OPEN_PAREN formal_parameter list_E_fparameter CLOSE_PAREN | OPEN_PAREN CLOSE_PAREN
formal_parameter : expression_parameter | function_parameter
function : return_options IDENTIFIER formal_parameters block
function_parameter : return_options IDENTIFIER formal_parameters
expression_parameter : VAR identifier_list IDENTIFIER | identifier_list IDENTIFIER
variable_creation : identifier_list COLON type SEMI_COLON
labels : LABELS identifier_list SEMI_COLON
list_E_identifiers : list_E_identifiers COMMA IDENTIFIER |
identifier_list : IDENTIFIER list_E_identifiers
return_options : VOID | IDENTIFIER
State 12 conflicts: 1 reduce/reduce
state 12
56 identifier_list: IDENTIFIER . list_E_identifiers
60 return_options: IDENTIFIER .
102 list_E_identifiers: . list_E_identifiers COMMA IDENTIFIER
103 | .
COMMA reduce using rule 103 (list_E_identifiers)
IDENTIFIER reduce using rule 60 (return_options)
IDENTIFIER [reduce using rule 103 (list_E_identifiers)]
$default reduce using rule 60 (return_options)
list_E_identifiers go to state 23
State 64 conflicts: 1 shift/reduce
state 64
8 body: OPEN_BRACE list_E_statement . CLOSE_BRACE
17 statement: . opt_declaration unlabeled_statement
18 | . compound
31 compound: . OPEN_BRACE list_NE_unlstatement CLOSE_BRACE
73 opt_declaration: . IDENTIFIER COLON
74 | .
94 list_E_statement: list_E_statement . statement
CLOSE_BRACE shift, and go to state 68
IDENTIFIER shift, and go to state 69
OPEN_BRACE shift, and go to state 70
IDENTIFIER [reduce using rule 74 (opt_declaration)]
$default reduce using rule 74 (opt_declaration)
statement go to state 71
compound go to state 72
opt_declaration go to state 73
Can anyone help me? I've looked at http://www.gnu.org/software/bison/manual/html_node/Understanding.html but can't understand what this means.
I can post the full grammar if that would help.
Thank you!
Upvotes: 0
Views: 491
Reputation: 241911
The second conflict is a classic problem with "optional" elements. It's very tempting to write a rule for optional labels as you have done it, but the fact that optional_label
could produce nothing forces the parser to try to make a decision before it has enough information.
LR parsers must "reduce" (recognize) a non-terminal before absorbing any further tokens. They can lookahead at the next k tokens (the next 1 token for an LR(1) parser, which is what bison generates), but they can't tentatively use the token and later go back and do the reduction.
So when the parser is at the point where the next token, which is an identifier, should start a statement, it might be looking at a statement which starts with an identifier, or it might be looking at a label which starts with an identifier. It could tell the difference by looking at the colon which follows the identifier (if any) but it can't see that far ahead.
Now, if it weren't for the fact that it is required to reduce either an empty optional_declaration
or one containing a label, there would not be a problem. If you had written something like this:
statement: basic_statement | compound
basic_statement: unlabeled_statement | declaration unlabeled_statement
declaration: IDENTIFIER COLON
then the parser would not have to make a decision when it sees the identifier. It only has to make a decision when it reaches the end of a production; it is perfectly capable of pressing forward when there are two possible productions to complete. But when you force it to recognize an optional label, then it has to know whether the label was not there in order to reduce (recognize) the empty production.
For the first conflict, we can see from the output that there is some context in which the lookahead symbol is IDENTIFIER
and you could have either a return_options
or an identifier_list
. Since both of those productions can produce a single IDENTIFIER
, the parser will not know which one to reduce.
With the actual grammar available, it is easy to find the context in which both return_options IDENTIFIER
and identifier_list IDENTIFIER
are possible:
formal_parameter : expression_parameter | function_parameter
expression_parameter: identifier_list IDENTIFIER
function_parameter : return_options IDENTIFIER …
That grammar is not ambiguous. If IDENTIFIER IDENTIFIER
is the start of function_parameter
, then it must be followed by a (; if it is an expression_parameter
, then it must be followed by either , or ). But that's the second next token, which means you'd need an LR(2) parser.
So I will give my usual advice on handling LR(2) grammars. It is possible to rewrite an LR(k) grammar as an LR(1) grammar, regardless of the value of k, but the result is usually bloated and ugly. So if you are using bison and you are willing to live with the possibility that action evaluations could be slightly delayed, then your easiest solution is to ask bison to generate a GLR parser. Often, just adding %glr-parser
to the options section is enough.
It's worth noting that your grammar seems to be an uneasy mix between C and Pascal-like syntaxes. In C, the first token in a parameter is always a type; either the return type of a function, or the type of the following identifier. In Pascal, the last token in a parameter is the type. But in your grammar, sometimes the first token is the type and sometimes it's the last token. In a certain sense, it is this inconsistency which leads to the awkwardness in the grammar.
(Pascal has a lot more punctuation: the type is always preceded by a colon and a function parameter is preceded by the word function
. These extra tokens are not needed to make the grammar work, but it can be argued that they make the syntax easier to read by humans.)
Upvotes: 0