Reputation: 157
Here are part of my grammar:
expr_address
: expr_address_category expr_opt { $$ = new ExprAddress($1,*$2);}
| axis axis_data { $$ = new ExprAddress($1,*$2);}
;
axis_data
: expr_opt { $$ = $1;}
| sign { if($1 == MINUS)
$$ = new IntergerExpr(-1000000000);
else if($1 == PLUS)
$$ = new IntergerExpr(+1000000000);}
;
expr_opt
: { $$ = new IntergerExpr(0);}
| expr { $$ = $1;}
;
expr_address_category
: I { $$ = NCAddress_I;}
| J { $$ = NCAddress_J;}
| K { $$ = NCAddress_K;}
;
axis
: X { $$ = NCAddress_X;}
| Y { $$ = NCAddress_Y;}
| Z { $$ = NCAddress_Z;}
| U { $$ = NCAddress_U;}
| V { $$ = NCAddress_V;}
| W { $$ = NCAddress_W;}
;
expr
: '[' expr ']' {$$ = $2;}
| COS parenthesized_expr {$$ = new BuiltinMethodCallExpr(COS,*$2);}
| SIN parenthesized_expr {$$ = new BuiltinMethodCallExpr(SIN,*$2);}
| ATAN parenthesized_expr {$$ = new BuiltinMethodCallExpr(ATAN,*$2);}
| SQRT parenthesized_expr {$$ = new BuiltinMethodCallExpr(SQRT,*$2);}
| ROUND parenthesized_expr {$$ = new BuiltinMethodCallExpr(ROUND,*$2);}
| variable {$$ = $1;}
| literal
| expr '+' expr {$$ = new BinaryOperatorExpr(*$1,PLUS,*$3);}
| expr '-' expr {$$ = new BinaryOperatorExpr(*$1,MINUS,*$3);}
| expr '*' expr {$$ = new BinaryOperatorExpr(*$1,MUL,*$3);}
| expr '/' expr {$$ = new BinaryOperatorExpr(*$1,DIV,*$3);}
| sign expr %prec UMINUS {$$ = new UnaryOperatorExpr($1,*$2);}
| expr EQ expr {$$ = new BinaryOperatorExpr(*$1,EQ,*$3);}
| expr NE expr {$$ = new BinaryOperatorExpr(*$1,NE,*$3);}
| expr GT expr {$$ = new BinaryOperatorExpr(*$1,GT,*$3);}
| expr GE expr {$$ = new BinaryOperatorExpr(*$1,GE,*$3);}
| expr LT expr {$$ = new BinaryOperatorExpr(*$1,LT,*$3);}
| expr LE expr {$$ = new BinaryOperatorExpr(*$1,LE,*$3);}
;
variable
: d_h_address {$$ = new AddressExpr(*$1);}
;
d_h_address
: D INTEGER_LITERAL { $$ = new IntAddress(NCAddress_D,$2);}
| H INTEGER_LITERAL { $$ = new IntAddress(NCAddress_H,$2);}
;
I hope my grammar support that like:
H100=20;
X;
X+0;
X+;
X+H100; //means H100 variable ref
The top two are same with X0; By the way,sign -> +/-;
But bison report conflicts,the key part of bison.output:
State 108
11 expr: sign . expr
64 axis_data: sign .
INTEGER_LITERAL shift, and go to state 93
REAL_LITERAL shift, and go to state 94
'+' shift, and go to state 74
'-' shift, and go to state 75
COS shift, and go to state 95
SIN shift, and go to state 96
ATAN shift, and go to state 97
SQRT shift, and go to state 98
ROUND shift, and go to state 99
D shift, and go to state 35
H shift, and go to state 36
'[' shift, and go to state 100
D [reduce using rule 64 (axis_data)]
H [reduce using rule 64 (axis_data)]
$default reduce using rule 64 (axis_data)
State 69
62 expr_address: axis . axis_data
INTEGER_LITERAL shift, and go to state 93
REAL_LITERAL shift, and go to state 94
'+' shift, and go to state 74
'-' shift, and go to state 75
COS shift, and go to state 95
SIN shift, and go to state 96
ATAN shift, and go to state 97
SQRT shift, and go to state 98
ROUND shift, and go to state 99
D shift, and go to state 35
H shift, and go to state 36
'[' shift, and go to state 100
D [reduce using rule 65 (expr_opt)]
H [reduce using rule 65 (expr_opt)]
$default reduce using rule 65 (expr_opt)
State 68
61 expr_address: expr_address_category . expr_opt
INTEGER_LITERAL shift, and go to state 93
REAL_LITERAL shift, and go to state 94
'+' shift, and go to state 74
'-' shift, and go to state 75
COS shift, and go to state 95
SIN shift, and go to state 96
ATAN shift, and go to state 97
SQRT shift, and go to state 98
ROUND shift, and go to state 99
D shift, and go to state 35
H shift, and go to state 36
'[' shift, and go to state 100
D [reduce using rule 65 (expr_opt)]
H [reduce using rule 65 (expr_opt)]
$default reduce using rule 65 (expr_opt)
I don't know how to deal with this,thanks advance.
EDIT: I make a minimal grammar:
%{
#include <stdio.h>
extern "C" int yylex();
void yyerror(const char *s) { printf("ERROR: %s/n", s); }
%}
%token PLUS '+' MINUS '-'
%token D H I J K X Y Z INT
/*%type sign expr var expr_address_category expr_opt
%type axis */
%start word_list
%%
/*Above grammar lost this rule,it makes ambiguous*/
word_list
: word
| word_list word
;
sign
: PLUS
| MINUS
;
expr
: var
| sign expr
| '[' expr ']'
;
var
: D INT
| H INT
;
word
: expr_address
| var '=' expr
;
expr_address
: expr_address_category expr_opt
/*| '(' axis sign ')'*/
| axis sign
;
expr_opt
: /* empty */
| expr
;
expr_address_category
: I
| J
| K
| axis
;
axis
: X
| Y
| Z
;
%%
and I hope it can support:
X;
X0;
X+0; //the top three are same with X0
X+;
X+H100; //this means X's data is ref +H100;
X+H100=10; //two word on a block,X+ and H100=10;
XH100=10; //two word on a block,X and H100=10;
EDIT2:
The above EDIT lost this rule.
block
: word_list ';'
| ';'
;
Because I have to allow such grammar:
H000 = 100 H001 = 200 H002 = 300;
Upvotes: 0
Views: 947
Reputation: 241931
This is essentially the classic LR(2) grammar, except that in your case it is LR(3) because your variables consist of two tokens [Note 1]:
var : D INT | H INT
The basic problem is the concatenation of words without separators:
word_list : word | word_list word
combined with the fact that one of the options for word
ends with an optional var
:
word: expr_address
expr_address: expr_address_category expr_opt
while the other one starts with a var
:
word: var '=' expr
The = makes this unambiguous, since nothing in an expr
can contain that symbol. But at the point where a decision needs to be made, the = is not visible, because the lookahead is the first token of a var
-- either an H
or a D
-- and the equals sign is still two tokens away.
This LR(2) grammar is very similar to the grammar used by yacc/bison itself, a fact which I always find to be ironic, because the grammar for yacc does not require ; between productions:
production: SYMBOL ':' | production SYMBOL /* Lots of detail omitted */
As with your grammar, this makes it impossible to know whether a SYMBOL
should be shifted or trigger a reduce because the disambiguating : is still not visible.
Since the grammar is (I assume) unambiguous, and bison can now generate GLR parsers, that will be the simplest solution: just add
%glr-parser
to your bison prologue (but read the section of the bison manual on GLR parsers to understand the trade-off).
Note that the shift-reduce conflicts will still be reported as warnings; since it is impossible to reliably decide whether a grammar is ambiguous, bison doesn't attempt to do so and ambiguities will be reported at run-time if they exist.
You should also fix the issue mentioned in @ChrisDodd's answer regarding the refactoring of expr_address
(although with a GLR parser it is not strictly necessary).
If, for whatever reason, you feel that a GLR parser will not meet your needs, you could use the solution in most implementations of yacc (including bison), which is a hack in the lexical scanner. The basic idea is to mark whether a symbol is followed by a colon or not in the lexer, so that the above production could be rewritten as:
production: SYMBOL_COLON | production SYMBOL
This solution would work for you if you were willing to combine the letter and the number into a single token:
word: expr_address expr_opt
| VARIABLE_EQUALS expr
// ...
expr: VARIABLE
My preference is to do this transformation in a wrapper around the lexer, which keeps a (one-element) queue of pending tokens:
/* The use of static variables makes this yylex wrapper unreliable
* if it is reused after a syntax error.
*/
int yylex_wrapper() {
static int saved_token = -1;
static YYSTYPE saved_yylval = {0};
int token = saved_token;
saved_token = -1;
yylval = saved_yylval;
// Read a new token only if we don't have one in the queue.
if (token < 0) token = yylex();
// If the current token is IDENTIFIER, check the next token
if (token == IDENTIFIER) {
// Read the next token into the queue (saved_token / saved_yylval)
YYSTYPE temp_val = yylval;
saved_token = yylex();
saved_yylval = yylval;
yylval = temp_val;
// If the second token is '=', then modify the current token
// and delete the '=' from the queue
if (saved_token == '=') {
saved_token = -1;
token = IDENTIFIER_EQUALS;
}
}
return token;
}
Personally, I would start by making a var
a single token (do you really want to allow people to write:
H /* Some comment in the middle of the variable name */ 100
but that's not going to solve any problems; it merely reduces the grammar's lookahead requirement from LR(3) to LR(2).
Upvotes: 1
Reputation: 126536
The main problem is that it can't figure out where one word
in a word_list
ends and the next one begins, because there is no separator token between words. This is in contrast to your examples, which all have ;
terminators. So that suggests one obvious fix -- put in the ;
separators:
word: expr_address ';'
| var '=' expr ';'
That fixes most of the problems, but leaves a lookahead conflict where it can't decide whether an axis
is an expr_address_category
or not when the lookahead is a sign
, because it depends on whether there's an expr
after the sign or not. You can fix that by refactoring to defer deciding:
expr_address
: expr_address_category expr_opt
| axis expr_opt
| axis sign
..and remove axis
from expr_address_category
Upvotes: 0