Tom
Tom

Reputation: 1301

Bison/yacc parser skipping grammer when not separated by space - "unexpected $end"

Hi I have a scenario where bison will successfully parse my input if there is a space separating a grammar...

Here is the situation: I am attempting to declare a variable:

int a = 31 ;

This yyin parses successfully

int a = 31;

Does not parse successfully

The error I receive is:

syntax error, unexpected $end, expecting TSEMI

Here is the section of the bison code

%token <string> TIDENTIFIER TINTEGER TDOUBLE
%token <token> TCEQUAL TCNE TCLT TCLE TCGT TCGE TASSIGN
%token <token> TLPAREN TRPAREN TLBRACE TRBRACE TCOMMA TDOT TSEMI
%token <token> TPLUS TMINUS TMUL TDIV

...

var_decl : ident ident TSEMI { $$ = new VarDel($1, $2); }
         | ident ident TASSIGN expr TSEMI {$$ = new VarDel($1, $2, $4);}
         ;

ident : TIDENTIFIER { $$ = new Var($1->c_str()); delete $1; }
      ;

expr : ident { $<ident>$ = $1; }
     | numeric
     ;

numeric : TINTEGER { $$ = new Num(atol($1->c_str())); delete $1; }
        | TDOUBLE { $$ = new Num(atof($1->c_str())); delete $1; }
        ;

And here is a section of my flex file


[ \t\n]                 ;
[a-zA-Z_][a-zA-Z0-9_]*  SAVE_TOKEN; return TIDENTIFIER;
[0-9]+.[0-9]*           SAVE_TOKEN; return TDOUBLE;
[0-9]+                  SAVE_TOKEN; return TINTEGER;
"="                     return TOKEN(TASSIGN);
"=="                    return TOKEN(TCEQUAL);
"!="                    return TOKEN(TCNE);
"<"                     return TOKEN(TCLT);
"<="                    return TOKEN(TCLE);
">"                     return TOKEN(TCGT);
">="                    return TOKEN(TCGE);
"("                     return TOKEN(TLPAREN);
")"                     return TOKEN(TRPAREN);
"{"                     return TOKEN(TLBRACE);
"}"                     return TOKEN(TRBRACE);
"."                     return TOKEN(TDOT);
","                     return TOKEN(TCOMMA);
"+"                     return TOKEN(TPLUS);
"-"                     return TOKEN(TMINUS);
";"                     return TOKEN(TSEMI);
"*"                     return TOKEN(TMUL);
"/"                     return TOKEN(TDIV);
.                       printf("Unknown token!n"); yyterminate();


Why is it parsing successfully when there is a space but not when there is one?

Thanks

Upvotes: 1

Views: 107

Answers (1)

rici
rici

Reputation: 241841

[0-9]+.[0-9]* should be [0-9]+\.[0-9]*. As written it matches 31;.

You would do well to enable flex debugging (the -d command-line flag) to see how it tokenises. Also, using atof silently hides the fact that the token is not a valid number. Consider using a safer string→number converter; you'll find one in the C++ standard library; in C, it would be strtod followed by a check that endptr is at the the end. (And you could do this conversion in the lexer, avoiding the unnecessary allocation and deallocation of a string.)

Upvotes: 3

Related Questions