Reputation: 639
I have two VERY tiny files (in an attempt to remove all other confounding variables) written in Lex and Yacc, respectively.
Lex:
%{
#include <stdlib.h>
#include "y.tab.h"
void yyerror(char *);
%}
%%
[a] {
yylval = *yytext;
return VAR;
}
[ \t\n] ;
. yyerror("invalid character");
%%
int yywrap(void) {
return 1;
}
Yacc:
%token VAR
%{
void yyerror(char *);
int yylex(void);
#include <stdio.h>
%}
%%
butts:
VAR { printf("%d\n", $1); }
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
int main(void) {
#if YYDEBUG
yydebug = 1;
#endif
yyparse();
return 0;
}
and when I compile the whole thing (with the -DYYDEBUG) option, I get the output:
Starting parse
Entering state 0
Reading a token: a
Next token is token VAR ()
Shifting token VAR ()
Entering state 1
Reducing stack by rule 1 (line 12):
$1 = token VAR ()
97
-> $$ = nterm butts ()
Stack now 0
Entering state 2
Reading a token: a
Next token is token VAR ()
syntax error
Error: popping nterm butts ()
Stack now 0
Cleanup: discarding lookahead token VAR ()
Stack now 0
when inputing "a" twice. The first time I press "a" when it asks Reading a token:
the program seems to run fine, but the second time, it vomits.
I am at a loss as to why this is so.
Upvotes: 0
Views: 494
Reputation: 5893
This is because your grammar file says that only one "a" is permitted. Any more is an error, and thus you get an error. Your grammar rule says:
butts: VAR
nothing more, nothing less.
Thus the only valid program that your grammar matches is:
a
Any other input, such as:
aa
or:
a
a
Will cause a syntax error. Your rule very explicitly says one VAR only; not a sequence of VARS; not a few vars. Just one VAR.
If you want it to match more than one in the input, you have to say so. The grammar thus has to describe the permitted sequence:
butts: VAR | butts VAR
Then it will permit the sequence.
Is that clearer?
Upvotes: 3