Reputation: 366
Am trying to construct a simple compiler in my class, its second week and am totally stuck at these points: Am providing my simple.l as (flex and bison files are snipped to save space):
..snip..
end {return(END);}
skip {return(SKIP);}
in {return(IN);}
integer {return(INTEGER);}
let {return(LET);}
..snip..
[ \t\n\r]+
and simple.y as :
%start program
%token LET IN END
%token SKIP IF THEN ELSE WHILE DO READ WRITE FI ASSGNOP
%token NUMBER PERIOD COMMA SEMICOLON INTEGER
%token IDENTIFIER EWHILE LT
%left '-' '+'
%left '*' '/'
%right '^'
%%
program : LET declarations IN commands END SEMICOLON
declarations :
|INTEGER id_seq IDENTIFIER PERIOD
;
id_seq:
|id_seq IDENTIFIER COMMA
;
commands :
| commands command SEMICOLON
;
command : SKIP
;
exp : NUMBER
| IDENTIFIER
| '('exp')'
;
..snip..
%%
My first problem is when I compile and execute this, it properly accepts my input till the end but it does not stop at end; i.e it again comes to start state , isn't it supposed to terminate when it encounters an end :
On the input :
let
integer x.
in
skip;
end;
here is an output :
Starting parse
Entering state 0
Reading a token: let
Next token is token LET ()
Shifting token LET ()
Entering state 1
Reading a token: integer x.
Next token is token INTEGER ()
Shifting token INTEGER ()
Entering state 3
Reducing stack by rule 4 (line 22):
-> $$ = nterm id_seq ()
Stack now 0 1 3
Entering state 6
Reading a token: Next token is token IDENTIFIER ()
Shifting token IDENTIFIER ()
Entering state 8
Reading a token: Next token is token PERIOD ()
Shifting token PERIOD ()
Entering state 10
Reducing stack by rule 3 (line 20):
$1 = token INTEGER ()
$2 = nterm id_seq ()
$3 = token IDENTIFIER ()
$4 = token PERIOD ()
-> $$ = nterm declarations ()
Stack now 0 1
Entering state 4
Reading a token: in
Next token is token IN ()
Shifting token IN ()
Entering state 7
Reducing stack by rule 6 (line 25):
-> $$ = nterm commands ()
Stack now 0 1 4 7
Entering state 9
Reading a token: skip;
Next token is token SKIP ()
Shifting token SKIP ()
Entering state 13
Reducing stack by rule 8 (line 28):
$1 = token SKIP ()
-> $$ = nterm command ()
Stack now 0 1 4 7 9
Entering state 19
Reading a token: Next token is token SEMICOLON ()
Shifting token SEMICOLON ()
Entering state 29
Reducing stack by rule 7 (line 26):
$1 = nterm commands ()
$2 = nterm command ()
$3 = token SEMICOLON ()
-> $$ = nterm commands ()
Stack now 0 1 4 7
Entering state 9
Reading a token: end;
Next token is token END ()
Shifting token END ()
Entering state 12
Reading a token: Next token is token SEMICOLON ()
Shifting token SEMICOLON ()
Entering state 20
Reducing stack by rule 1 (line 18):
$1 = token LET ()
$2 = nterm declarations ()
$3 = token IN ()
$4 = nterm commands ()
$5 = token END ()
$6 = token SEMICOLON ()
-> $$ = nterm program ()
Stack now 0
Entering state 2
Reading a token:
Why is it ready to read a token again when I have entered end; ?? What am I missing ? Shouldn't it end here ? If I enter anything now it gives me following error :
Reading a token: let
Next token is token LET ()
syntax error, unexpected LET, expecting $end
Error: popping nterm program ()
Stack now 0
Cleanup: discarding lookahead token LET ()
Stack now 0
My Second doubt is what should be the next step in implementing this compiler ? I mean what more steps are required between this and code generation part ? How do I implement Symbol table now ? and How do I make this parser accept code from file . Till now am providing input in terminal, what if I want to make this accept code from file like my_program.simple ? Thank You.
Upvotes: 0
Views: 394
Reputation: 60004
declarations :
|INTEGER id_seq IDENTIFIER PERIOD
;
...
I think that you're using a wrong syntax: you state that declarations
(as well as idseq
and commands
) could be epsilon
, i.e. an empty production. That because |
it's the alternative
operator. Alternative between an empty body and the actual pattern. Doesn't make sense.
I think could be the cause of your parser looping.
For a symbol table you can use a map (I hope you're generating C++), declared global outside the parser. Then insert symbols when you seen them.
Before to get a compiler, could be useful to have a working interpreter, it's easier and clarifies many aspects that will be reused building the compiler.
Upvotes: 1