Mojo Jojo
Mojo Jojo

Reputation: 366

Doubts regarding Compiler Construction (Flex/Bison)

Am trying to construct a simple compiler in my class, its second week and am totally stuck at these points: Am providing my simple.l as (flex and bison files are snipped to save space):

..snip..
end     {return(END);}
skip        {return(SKIP);}
in      {return(IN);}
integer {return(INTEGER);}
let     {return(LET);}
..snip..
[ \t\n\r]+  

and simple.y as :

%start program
%token LET IN END 
%token SKIP IF THEN ELSE WHILE DO READ WRITE FI ASSGNOP
%token NUMBER PERIOD COMMA SEMICOLON INTEGER
%token IDENTIFIER EWHILE LT
%left '-' '+'
%left '*' '/'
%right '^'
%%

program : LET declarations IN commands END SEMICOLON
declarations :  
|INTEGER id_seq IDENTIFIER PERIOD
;
id_seq:
|id_seq IDENTIFIER COMMA
;
commands : 
| commands command SEMICOLON
;
command : SKIP
;
exp : NUMBER
| IDENTIFIER
| '('exp')'
;
..snip..
%%

My first problem is when I compile and execute this, it properly accepts my input till the end but it does not stop at end; i.e it again comes to start state , isn't it supposed to terminate when it encounters an end :

On the input :

let
integer x.
in
skip;
end;

here is an output :

Starting parse
Entering state 0
Reading a token: let
Next token is token LET ()
Shifting token LET ()
Entering state 1
Reading a token: integer x.
Next token is token INTEGER ()
Shifting token INTEGER ()
Entering state 3
Reducing stack by rule 4 (line 22):
-> $$ = nterm id_seq ()
Stack now 0 1 3
Entering state 6
Reading a token: Next token is token IDENTIFIER ()
Shifting token IDENTIFIER ()
Entering state 8
Reading a token: Next token is token PERIOD ()
Shifting token PERIOD ()
Entering state 10
Reducing stack by rule 3 (line 20):
   $1 = token INTEGER ()
   $2 = nterm id_seq ()
   $3 = token IDENTIFIER ()
   $4 = token PERIOD ()
-> $$ = nterm declarations ()
Stack now 0 1
Entering state 4
Reading a token: in
Next token is token IN ()
Shifting token IN ()
Entering state 7
Reducing stack by rule 6 (line 25):
-> $$ = nterm commands ()
Stack now 0 1 4 7
Entering state 9
Reading a token: skip;
Next token is token SKIP ()
Shifting token SKIP ()
Entering state 13
Reducing stack by rule 8 (line 28):
   $1 = token SKIP ()
-> $$ = nterm command ()
Stack now 0 1 4 7 9
Entering state 19
Reading a token: Next token is token SEMICOLON ()
Shifting token SEMICOLON ()
Entering state 29
Reducing stack by rule 7 (line 26):
   $1 = nterm commands ()
   $2 = nterm command ()
   $3 = token SEMICOLON ()
-> $$ = nterm commands ()
Stack now 0 1 4 7
Entering state 9
Reading a token: end;
Next token is token END ()
Shifting token END ()
Entering state 12
Reading a token: Next token is token SEMICOLON ()
Shifting token SEMICOLON ()
Entering state 20
Reducing stack by rule 1 (line 18):
   $1 = token LET ()
   $2 = nterm declarations ()
   $3 = token IN ()
   $4 = nterm commands ()
   $5 = token END ()
   $6 = token SEMICOLON ()
-> $$ = nterm program ()
Stack now 0
Entering state 2
Reading a token:

Why is it ready to read a token again when I have entered end; ?? What am I missing ? Shouldn't it end here ? If I enter anything now it gives me following error :

Reading a token: let 
Next token is token LET ()
syntax error, unexpected LET, expecting $end
Error: popping nterm program ()
Stack now 0
Cleanup: discarding lookahead token LET ()
Stack now 0

My Second doubt is what should be the next step in implementing this compiler ? I mean what more steps are required between this and code generation part ? How do I implement Symbol table now ? and How do I make this parser accept code from file . Till now am providing input in terminal, what if I want to make this accept code from file like my_program.simple ? Thank You.

Upvotes: 0

Views: 394

Answers (1)

CapelliC
CapelliC

Reputation: 60004

declarations :  
|INTEGER id_seq IDENTIFIER PERIOD
;
...

I think that you're using a wrong syntax: you state that declarations (as well as idseq and commands) could be epsilon, i.e. an empty production. That because | it's the alternative operator. Alternative between an empty body and the actual pattern. Doesn't make sense.

I think could be the cause of your parser looping.

For a symbol table you can use a map (I hope you're generating C++), declared global outside the parser. Then insert symbols when you seen them.

Before to get a compiler, could be useful to have a working interpreter, it's easier and clarifies many aspects that will be reused building the compiler.

Upvotes: 1

Related Questions