Reputation: 1185
I have the following Bison grammar:
%error-verbose
%{
#include "node.h"
NBlock *programBlock;
#define YYDEBUG 1
extern int yylex();
void yyerror(const char *s) { printf("Error: %s\n", s); }
%}
%union {
Node *node;
NBlock *block;
NBody *body;
NHeader *header;
NExpression *expression;
NStatement *statement;
NIdentifier *identifier;
NVariableDeclaration *variableDeclaration;
NDoWhileStatement *doWhileStatement;
NWhileStatement *whileStatement;
NIfStatement *ifStatement;
NForStatement *forStatement;
std::vector<NVariableDeclaration*> *variableDeclarations;
std::vector<NExpression*> *expressions;
std::vector<NStatement*> *statements;
std:string *string;
int token;
}
/*
The %token directive is used to associate a type to a terminal symbol.
%token <type> 'terminal_list'
associates the specific type <type> to each terminal in 'terminal_list'.
The type <type> is the same used in the %union declaration
*/
%token <string> TIDENTIFIER TINTEGER TDOUBLE
%token <token> TCEQ TCNE TCLT TCLE TCGT TCGE TEQUAL
%token <token> TLPAREN TRPAREN TLBRACE TRBRACE TCOMMA TDOT
%token <token> TPLUS TMINUS TMUL TDIV TDO TDOUBLE_TYPE TINT_TYPE
%token <token> TELSE TFOR TIF TSEMICOLON TTHEN TWHILE
/*
The %type directive is used to associate a type to a nonterminal symbol.
%type <type> nonterminal_list
associates the specific type <type> to each nonterminal in 'nonterminal_list'.
The type <type> is the same used in the %union declaration
*/
%type <expression> expression term factor
%type <block> program body header tail statements
%type <statement> statement forStatement ifStatement doWhileStatement whileStatement variableDeclaration
%type <token> comparison
%type <string> identifier_type
/*
Operator precedence for mathematical operators
*/
%left TPLUS TMINUS
%left TMUL TDIV
%left TCEQ TCNE TCLT TCLE TCGT TCGE
/*
Start grammar symbol
*/
%start program
%%
program: TLBRACE body TRBRACE { printf("Reduce body to program\n"); }
;
body: header TLBRACE block TRBRACE tail { printf("Reduce header block tail to body\n"); }
;
header: TLBRACE variableDeclarations TRBRACE { printf("Reduce variableDeclarations to header\n"); }
| TLBRACE TRBRACE { printf("Reduce empty to header\n"); }
;
variableDeclarations: variableDeclaration TSEMICOLON { printf("Reduce variable declaration to header\n"); }
| variableDeclarations variableDeclaration TSEMICOLON { printf("Reduce variable declaration list to header\n"); }
;
tail: TLBRACE statements TRBRACE { printf("reduce statement list to tail\n"); }
| TLBRACE TRBRACE { printf("Reduce empty to tal\n"); }
;
statements: statement TSEMICOLON { printf("Reduce statement to statement list\n"); }
| statements statement TSEMICOLON { printf("Reduce statement list to statement list\n"); }
;
statement: ifStatement { printf("Reduce if to statement\n"); };
| forStatement { printf("Reduce for to statement\n"); };
| doWhileStatement { printf("Reduce doWhile to statement\n"); };
| whileStatement { printf("reduce while to statement\n"); }
| TIDENTIFIER TEQUAL expression { printf("Reduce assignment to expression\n"); }
;
forStatement: TFOR TLPAREN expression TSEMICOLON expression TSEMICOLON expression TRPAREN block { printf("Reduce for to for statement\n"); }
;
ifStatement: TIF expression TTHEN block { printf("Reduce if to if statement\n"); }
| TIF expression block TELSE block { printf("Reduce ifelse to if statement\n"); }
;
doWhileStatement: TDO block TWHILE expression { printf("reduce dowhile to while statement\n"); }
;
whileStatement: TWHILE block expression { printf("Reduce while to while statement\n"); }
;
block: TLBRACE statements TRBRACE { printf("Reduce statement list to block\n"); }
| TLBRACE TRBRACE { printf("Reduce empty to block\n"); }
;
variableDeclaration: identifier_type TIDENTIFIER { printf("reduce uninitialized identifier to variable declaration\n"); }
| identifier_type TIDENTIFIER TEQUAL expression { printf("Reduce initialized identifier to variable declaration\n"); }
;
identifier_type: TINT_TYPE { printf("Reduce int to identifier type\n"); }
| TDOUBLE_TYPE { printf("Reduce double to identifier type\n"); }
| { printf("Reduce empty to identifier type\n"); }
;
expression: term { printf("Reduce term to expresson\n"); }
| expression comparison expression { printf("Reduce comparison to expression\n"); }
| expression TPLUS term { printf("Reduce addition to expression\n"); }
| expression TMINUS term { printf("Reduce subtraction to expression\n"); }
;
term: factor { printf("Reduce factor to term\n"); }
| term TMUL factor { printf("Reduce multiplication to term\n"); }
| term TDIV factor { printf("Reduce division to term\n"); }
;
factor: TLPAREN expression TRPAREN { printf("Reduce nested expression to expression\n"); }
| TMINUS factor { printf("Reduce -factor to factor\n"); }
| TIDENTIFIER { printf("Reduce identifier to factor\n"); }
| TINTEGER { printf("Reduce integer to numeric\n"); }
| TDOUBLE { printf("Reduce double to numeric\n"); }
;
comparison: TCEQ { printf("Reduce eq to comparison\n"); }
| TCNE { printf("Reduce ne to comparison\n"); }
| TCLT { printf("Reduce lt to comparison\n"); }
| TCLE { printf("Reduce le to comparison\n"); }
| TCGT { printf("reduce gt to comparison\n"); }
| TCGE { printf("Reduce ge to comparison\n"); }
;
and I get 8 shift/reduce conflicts which I don't know how to fix.
The following is a piace of the parser.output file which I generated using the --report=all parameter. It is the state which is affected by the 8 shift/reduce conflicts:
State 79
29 expression: expression . comparison expression
29 | expression comparison expression . [TCEQ, TCNE, TCLT, TCLE, TCGT, TCGE, TRPAREN, TLBRACE, TPLUS, TMINUS, TSEMICOLON, TTHEN]
30 | expression . TPLUS term
31 | expression . TMINUS term
40 comparison: . TCEQ
41 | . TCNE
42 | . TCLT
43 | . TCLE
44 | . TCGT
45 | . TCGE
TCEQ shift and go to state 56
TCNE shift and go to state 57
TCLT shift and go to state 58
TCLE shift and go to state 59
TCGT shift and go to state 60
TCGE shift and go to state 61
TPLUS shift and go to state 62
TMINUS shift and go to state 63
TCEQ [reduction with rule 29 (expression)]
TCNE [reduction with rule 29 (expression)]
TCLT [reduction with rule 29 (expression)]
TCLE [reduction with rule 29 (expression)]
TCGT [reduction with rule 29 (expression)]
TCGE [reduction with rule 29 (expression)]
TPLUS [reduction with rule 29 (expression)]
TMINUS [reduction with rule 29 (expression)]
$default reduction with rule 29 (expression)
comparison go to state 64
If I understand well the problem is that the parser does not know if going on reading another pice of text or reducing immediately the rule expression: expression comparison expression.
I would say that reducing immediately is correct. But if I this is correct, then how do I enforce immediate reduction insted of shifting?
Upvotes: 1
Views: 1801
Reputation: 1185
Thanks for your answers. Here is the complete Bison grammar:
%error-verbose
%{
#include "node.h"
#include <iostream>
#include <string>
NBlock *programBlock;
#define YYDEBUG 1
extern int yylex();
void yyerror(const char *) { printf("Error: %s\n", s); }
%}
%union {
Node *node;
NBlock *block;
NBody *body;
NHeader *header;
NExpression *expression;
NStatement *statement;
NIdentifier *identifier;
NVariableDeclaration *variableDeclaration;
NDoWhileStatement *doWhileStatement;
NWhileStatement *whileStatement;
NIfStatement *ifStatement;
NForStatement *forStatement;
std::vector<NVariableDeclaration*> *variableDeclarations;
std::vector<NExpression*> *expressions;
std::vector<NStatement*> *statements;
std:string *string;
int token;
}
%token <string> TIDENTIFIER TINTEGER TDOUBLE
%token <token> TCEQ TCNE TCLT TCLE TCGT TCGE TEQUAL
%token <token> TLPAREN TRPAREN TLBRACE TRBRACE TCOMMA TDOT
%token <token> TPLUS TMINUS TMUL TDIV TDO TDOUBLE_TYPE TINT_TYPE
%token <token> TELSE TFOR TIF TSEMICOLON TTHEN TWHILE TLSQUARE TRSQUARE
%type <expression> expression term factor
%type <block> program body header tail
%type <statements> statements
%type <statement> statement
%type <forStatement> forStatement
%type <ifStatement> ifStatement
%type <doWhileStatement> doWhileStatement
%type <whileStatement> whileStatement
%type <variableDeclaration> variableDeclaration
%type <token> comparison
%type <string> identifier_type
%type <variableDeclarations> variableDeclarations
/*
Start grammar symbol
*/
%start program
%%
program: TLBRACE body TRBRACE { printf("Reduce body to program\n"); }
;
body: header TLBRACE block TRBRACE tail { printf("Reduce header block tail to body\n"); }
;
header: TLBRACE variableDeclarations TRBRACE { printf("Reduce variableDeclarations to header\n"); }
| TLBRACE TRBRACE { printf("Reduce empty to header\n"); }
;
variableDeclarations: variableDeclaration TSEMICOLON { printf("Reduce variable declaration to header\n"); }
| variableDeclarations variableDeclaration TSEMICOLON { printf("Reduce variable declaration list to header\n"); }
;
tail: TLBRACE statements TRBRACE { printf("reduce statement list to tail\n"); }
| TLBRACE TRBRACE { printf("Reduce empty to tal\n"); }
;
statements: statement TSEMICOLON { printf("Reduce statement to statement list\n"); }
| statements statement TSEMICOLON { printf("Reduce statement list to statement list\n"); }
;
statement: ifStatement { printf("Reduce if to statement\n"); };
| forStatement { printf("Reduce for to statement\n"); };
| doWhileStatement { printf("Reduce doWhile to statement\n"); };
| whileStatement { printf("reduce while to statement\n"); }
| TIDENTIFIER TEQUAL expression { printf("Reduce assignment to expression\n"); }
| TIDENTIFIER TLSQUARE TINTEGER TRSQUARE TEQUAL expression { printf("Reduce array assignment to expression\n"); }
;
forStatement: TFOR TLPAREN expression TSEMICOLON expression TSEMICOLON expression TRPAREN block { printf("Reduce for to for statement\n"); }
;
ifStatement: TIF expression TTHEN block { printf("Reduce if to if statement\n"); }
| TIF expression TTHEN block TELSE block { printf("Reduce ifelse to if statement\n"); }
;
doWhileStatement: TDO block TWHILE expression { printf("reduce dowhile to while statement\n"); }
;
whileStatement: TWHILE block expression { printf("Reduce while to while statement\n"); }
;
block: TLBRACE statements TRBRACE { printf("Reduce statement list to block\n"); }
| TLBRACE TRBRACE { printf("Reduce empty to block\n"); }
;
variableDeclaration: identifier_type TIDENTIFIER { printf("Reduce uninitialized identifier to variable declaration\n"); }
| identifier_type TIDENTIFIER TLSQUARE TINTEGER TRSQUARE { printf("Reduce array to variable declaration\n"); }
| identifier_type TIDENTIFIER TEQUAL expression { printf("Reduce initialized identifier to variable declaration\n"); }
| identifier_type TIDENTIFIER TLSQUARE TINTEGER TRSQUARE TEQUAL expression { printf("Reduce initialized array to variable declaration\n"); }
;
identifier_type: TINT_TYPE { printf("Reduce int to identifier type\n"); }
| TDOUBLE_TYPE { printf("Reduce double to identifier type\n"); }
| { printf("Reduce empty to identifier type\n"); }
;
expression: add-expression { printf("Reduce add-expression to expression\n"); }
|add-expression comparison add-expression { printf("Reduce add-expression comparison add-expression to expression\n"); }
add-expression: term { printf("Reduce term to expresson\n"); }
| add-expression TPLUS term { printf("Reduce addition to expression\n"); }
| add-expression TMINUS term { printf("Reduce subtraction to expression\n"); }
;
term: factor { printf("Reduce factor to term\n"); }
| term TMUL factor { printf("Reduce multiplication to term\n"); }
| term TDIV factor { printf("Reduce division to term\n"); }
;
factor: TLPAREN expression TRPAREN { printf("Reduce nested expression to expression\n"); }
| TMINUS factor { printf("Reduce -factor to factor\n"); }
| TIDENTIFIER { printf("Reduce identifier to factor\n"); }
| TINTEGER { printf("Reduce integer to numeric\n"); }
| TDOUBLE { printf("Reduce double to numeric\n"); }
;
comparison: TCEQ { printf("Reduce eq to comparison\n"); }
| TCNE { printf("Reduce ne to comparison\n"); }
| TCLT { printf("Reduce lt to comparison\n"); }
| TCLE { printf("Reduce le to comparison\n"); }
| TCGT { printf("reduce gt to comparison\n"); }
| TCGE { printf("Reduce ge to comparison\n"); }
;
Upvotes: 0
Reputation: 126203
Your grammar is ambiguous -- an input like 1 < 2 < 3
can be parsed as either (1 < 2) < 3
or 1 < (2 < 3)
.
There are two ways to deal with this -- either add %left
/%right
/%nonassoc
directives to use bison's internal precedence handling, or introduce additional levels of rules to handle it.
Now for your other operators (*
/
+
-
) you're doing BOTH of these -- this is generally a mistake and you want to do just one or the other. But if you do do both, the additional rules will take precedence and the precedence directives will be ignored and sometimes cause suprising problems.
The "normal" handling for relations like this is to say you can't have multiple of them (1 < 2 < 3
is a syntax error and should not be parsed either left or right recursive.) To do this with additional rules, you'd change you expression rule to:
expression: add_expression
| add_expression comparison add_expression
;
add_expression: term
| add_expression TPLUS term
| add_expression TMINUS term
;
to use precedence directives, get rid of term
, factor
, and comparison
(moving them all into expression
and add:
%nonassoc TCEQ TCNE TCLT TCLE TCGT TCGE
%left TPLUS TMINUS
%left TMUL TDIV
Upvotes: 5
Reputation: 46960
The problem is that associativity rules (like your %left
directive) don't work through grammar indirection. If you do away with the comparison
rule and pull all the comparison operators up into the expression
rule, the problem goes away. (I verified this with Bison v2.4.1.)
This may be inconvienient because it makes expression
more verbose than you'd like. A workaround is to push the decision back to the scanner. Define COMPARISON
as a single token, and pass the flavor of comparison as a separate enumerated type through yylval
so it's available as $n
.
Upvotes: 1