Reputation: 1940
In the extremely simple example below, I want to read in the language of a single a
and assure that no remaining characters come after.
File: example.y
%{
#include <stdio.h>
#include <ctype.h>
int yylex(void);
int yyerror(char *s);
%}
%token A
%token END
%token JUNK
%% /* Grammar Rules */
accept: A END { printf("language accepted!\n"); }
;
%%
File: example.in
%{
#include "ex.tab.h"
#define YY_NO_INPUT
%}
%option nounput
%%
a printf("A found\n"); return A;
<<EOF>> { printf("EOF found\n"); return END; }
. { printf("JUNK found\n"); return JUNK; }
%%
The results of compiling and running this program with the following test input file:
a
produces the following output:
A found
EOF found
language accepted!
EOF found
Error: syntax error
Because EOF is read twice, I think that's why the program is not accepting my input language. My question is, why is EOF being read twice and how do I stop it?
Also, doing the above without the EOF rule causes inputs such as
abbbb
to print the "accept" message but then immediately fail because of the excess input. All I want is either a pass or a fail which is why I'm trying to use EOF to verify I will have one result.
Upvotes: 1
Views: 2920
Reputation: 1940
I was able to use the following solution to scan in an EOF with flex and pass it to Bison without getting caught up matching EOF a second time.
Make bison reduce to start symbol only if EOF is found
The solution involves using a start condition to read when an EOF is next without actually reading in EOF. Once the "initial" EOF is triggered (END can be sent to Bison), then EOF is really read-in and completes the flex/bison parse naturaully. At least that's my understanding of it.
Flex
%x REALLYEND <--- declare start condition
%option noinput nounput
%%
"END" { return END; }
. { return TOK; }
<INITIAL><<EOF>> { BEGIN(REALLYEND); return EOP; } <---- trigger start condition
<REALLYEND><<EOF>> { return 0; } <---- trigger EOF
%%
Bison
%%
prog : END EOP { printf ("ok\n"); }; <-- can use EOP just like END in my example
%%
Upvotes: 3
Reputation: 190
The answer is 3 fold
1) You should likely for most cases allow your bison parser to handle the EOF. It will likely be the easiest way to go. But this is not always possible.
2) Handling the <<EOF>>
built in flex rule has some special requirements (obligatory link to the manual).
3) A note on the point 1. If you are first find that you have a grammar that appears to require an end of file token of some kind that is ok (some variants of c require this) but it is considered bad form (some editors have settings to add newlines at the end of files which can cause conflicts).
Upvotes: 0
Reputation: 241791
bison (and all yacc derivatives I know of except for lemon) will not reduce the start production unless it is followed by an EOF token. In effect, it modifies the grammar to something like this:
$accept: accept $end;
accept: A END {...}
Your END
token is not the same as the built-in $end
token. So bison will happily reduce the accept
rule (and therefore trigger your printf, which seems to have a different message in your code than in your output, which suggests that they come from different versions of your code) but it will not reduce its own $accept
rule, and consequently will report a syntax error.
It's certainly the case that flex
is prepared to match <<EOF>>
more than once. I believe it will continue to do so as long as you ask for more tokens, but I could be wrong; certainly, it will match twice. But that's not your problem. Your problem is that you're trying to force bison to do what it would do anyway, except that you've made it impossible for it to do that.
In short, let flex return 0 for EOF, which is what it wants to do, and trust bison to only accept input which is terminated by an EOF. That will make your code much simpler.
(The tricky part is actually recognizing a sentence which does not go to the end of the input; for example, if you are embedding one language inside another -- javascript or CSS inside of HTML, for example. In that case, you do have to play some games, and I believe that is why lemon does not insert the usual augmented start rule.)
Upvotes: 1