Reputation: 31
I'm trying to write a Flex/Bison file from BNF Grammar. However, I get errors when I try to compile, and I'm not sure how to debug them.
BNF Grammer:
<exp>::=<list> | head(<list>)
<list>::=<num1>::<list> | <list>@<list> | tail(<list>) | [<numlist>]
<numlist>::=<empty> | <num1><num2>
<num2>::=<empty> | ,<num1><num2>
<num1>::=<D1><N> | <N> | head(<list>)
<D1>::=<D1><N> | <D2>
<D2>::=[1-9]
<N>::=[0-9]
My Flex File
%option noyywrap
%{
#include <stdlib.h>
#include <list>
using namespace std;
#define YYSTYPE list<int>
#include "listop.tab.h"
%}
hd head
tl tail
ap [@()\[\]]
con ::
n [0-9]
d2 [1-9]
d1 ({d2}{n}+)|{d2}
ws[ \t\n]+
%%
{d1} yylval.clear(); yylval.push_front(atoi(yytext)); return D1;
{n} yylval.clear(); yylval.push_front(atoi(yytext)); return N;
{hd} return HEAD;
{tl} return TAIL;
{ap} return *yytext;
{con} return CON;
{ws} /* eat up white spaces */
%%
My Bison File
%{
#include <math.h>
#include <stdio.h>
#include <iostream>
#include <list>
using namespace std;
#define YYSTYPE list<int>
void outputlist(list<int>);
int yylex(void);
int yyerror(const char*);
list<int> a;
%}
%token N
%token D1
%token HEAD
%token TAIL
%right CON
%left '@'
%% /* Grammer rules and actions follow */
input: /* empty */
| input exp ;
exp: list '\n' {outputlist($1);}
| HEAD '(' list ')' '\n' {cout <<$3.front();} ;
list: num1 CON list {$3.push_front($1.front()); $$=$3;}
| list '@' list {$1.splice($1.end(),$3); $$=$1;}
| TAIL '(' list ')'{$$ = $3;}
| '[' numlist ']' {$$ = $2;};
numlist: /* empty */
| num1 num2 {$1.splice($1.end(), $2); $$=$1;};
num2: /* empty */
| ',' num1 num2 {$2.splice($2.end(),$3); $$=$2;};
num1: D1 N { $1.splice($1.end(), $2); $$=$1;}
| N { $$ = $1;}
| HEAD '(' list ')' {list<int> a; a.push_front($3.front()); $$=a;};
%%
int main() { return yyparse();}
int yyerror(const char* s) { cout << "error" << endl; return 0;}
void outputlist(list<int> list1)
{
cout << '[';
for (int i = list1.size(); i > 1; i --)
{
cout << list1.front() << ',';
list1.pop_front();
}
cout << list1.front();
list1.pop_front();
cout << ']' << endl;
}
Since YYSTYPE
has been defined to be list<int>
in the C declaration part of both files, I'm using library functions to insert/remove the head, get first element, etc of a list.
Any help is appreciated. Thanks.
Edit: I have edited the flex & bison file above. My program now runs, but I don't get the correct output. This is what I get:
input: head([5,2,4])
output: ,error
correct output: 5
Upvotes: 2
Views: 5843
Reputation: 5883
Your code is not too difficult to debug using the tools available to you, such as bison, the C++ compiler and the bison manual for example. As you appear to be learning how to use flex and bison on simple exercises you may need a tutorial on how the debugging could be done. If you did not need a tutorial, perhaps another reader might. What follows will be a tutorial on debugging your example. I'm currently doing this with a Computer Science class so it will have a teacher tone.
First, I looked at your BNF. I note that you appear not have adhered rigidly to the BNF notation and the layout is a bit untidy. A better version would be:
<exp> ::= <list>
| head ( <list> )
<list> ::= <num1> :: <list>
| <list> @ <list>
| tail ( <list> )
| [ <numlist> ]
<numlist> ::= <empty>
| <num1> <num2>
<num2> ::= <empty>
| , <num1> <num2>
<num1> ::= <D1> <N>
| <N> | head ( <list> )
<D1> ::= <D1> <N>
| <D2>
<D2> ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<N> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<empty> ::=
See how the layout improves readability? You had used the []
symbols for two different purposes, as themselves and to indicate a character set. Sets of characters is a flex notation. The []
characters are used as meta-characters in extended BNF, but they are used to denote options and not sets. You had not defined the <empty>
non-terminal symbol. Although this example follows the strict notation, to prevent confusion between terminals and meta-symbols quotation marks would be used, as in:
<exp> ::= <list>
| head "(" <list> ")"
<list> ::= <num1> "::" <list>
| <list> "@" <list>
| tail "(" <list> ")"
| "[" <numlist> "]"
<numlist> ::= <empty>
| <num1> <num2>
<num2> ::= <empty>
| "," <num1> <num2>
<num1> ::= <D1> <N>
| <N> | head "(" <list> ")"
<D1> ::= <D1> <N>
| <D2>
<D2> ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<N> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<empty> ::=
Let's now move to looking at the lexical analysis. If we make the fix suggest by @rici, and add the missing comma, we have this rule:
ap [@()\[\],]
Now bison has an in-built debug capability which is very useful for understanding how the parse is working. Enabling this for your code enables it to be clearly seen where it fails. First we have to enable the tracing, as described in the manual. We need to do two things:
yydebug
to non-zero-t
(or --debug
) optionTherefore in your listop.y
file you should change the main()
function to be:
int main() {
#ifdef YYDEBUG
extern int yydebug;
yydebug = 1;
#endif
return yyparse();}
The rebuild bison:
bison listop.y -d -t
Now run flex and re-compile:
flex listop.l
g++ -o listop.exe listop.tab.c lex.yy.c -DYYDEBUG -lfl
Now, when we run, we get some diagnostic output (for your input of head([5,2,4])
):
Starting parse
Entering state 0
Reducing stack by rule 1 (line 25):
-> $$ = nterm input ()
Stack now 0
Entering state 1
Reading a token: head([5,2,4])
Next token is token HEAD ()
Shifting token HEAD ()
Entering state 5
Reading a token: Next token is token '(' ()
Shifting token '(' ()
Entering state 12
Reading a token: Next token is token '[' ()
Shifting token '[' ()
Entering state 7
Reading a token: Next token is token D1 ()
Shifting token D1 ()
Entering state 4
Reading a token: Next token is token ',' ()
error
Error: popping token D1 ()
Stack now 0 1 5 12 7
Error: popping token '[' ()
Stack now 0 1 5 12
Error: popping token '(' ()
Stack now 0 1 5
Error: popping token HEAD ()
Stack now 0 1
Error: popping nterm input ()
Stack now 0
Cleanup: discarding lookahead token ',' ()
Stack now 0
It can be seen from this line:
Reading a token: Next token is token D1 ()
that the digit 5 has been returned as the token D1
by the lexer. However consulting the BNF grammar, we can see that it should have been treated as the token N
. As D1
and N
contain much the same digits it could have chosen either one. Why did it choose D1
? This is because of the ordering of the pattern/action rules. The higher rules take precedence over the lower ones. To fix this it is necessary to change the order of the rules:
{d1} yylval.clear(); yylval.push_front(atoi(yytext)); return D1;
{n} yylval.clear(); yylval.push_front(atoi(yytext)); return N;
to:
{n} yylval.clear(); yylval.push_front(atoi(yytext)); return N;
{d1} yylval.clear(); yylval.push_front(atoi(yytext)); return D1;
and then build all the pieces again. If we try again, we the get the following longer trace:
Starting parse
Entering state 0
Reducing stack by rule 1 (line 25):
-> $$ = nterm input ()
Stack now 0
Entering state 1
Reading a token: head([5,2,4])
Next token is token HEAD ()
Shifting token HEAD ()
Entering state 5
Reading a token: Next token is token '(' ()
Shifting token '(' ()
Entering state 12
Reading a token: Next token is token '[' ()
Shifting token '[' ()
Entering state 7
Reading a token: Next token is token N ()
Shifting token N ()
Entering state 3
Reducing stack by rule 14 (line 38):
$1 = token N ()
-> $$ = nterm num1 ()
Stack now 0 1 5 12 7
Entering state 16
Reading a token: Next token is token ',' ()
Shifting token ',' ()
Entering state 24
Reading a token: Next token is token N ()
Shifting token N ()
Entering state 3
Reducing stack by rule 14 (line 38):
$1 = token N ()
-> $$ = nterm num1 ()
Stack now 0 1 5 12 7 16 24
Entering state 31
Reading a token: Next token is token ',' ()
Shifting token ',' ()
Entering state 24
Reading a token: Next token is token N ()
Shifting token N ()
Entering state 3
Reducing stack by rule 14 (line 38):
$1 = token N ()
-> $$ = nterm num1 ()
Stack now 0 1 5 12 7 16 24 31 24
Entering state 31
Reading a token: Next token is token ']' ()
Reducing stack by rule 11 (line 35):
-> $$ = nterm num2 ()
Stack now 0 1 5 12 7 16 24 31 24 31
Entering state 34
Reducing stack by rule 12 (line 36):
$1 = token ',' ()
$2 = nterm num1 ()
$3 = nterm num2 ()
-> $$ = nterm num2 ()
Stack now 0 1 5 12 7 16 24 31
Entering state 34
Reducing stack by rule 12 (line 36):
$1 = token ',' ()
$2 = nterm num1 ()
$3 = nterm num2 ()
-> $$ = nterm num2 ()
Stack now 0 1 5 12 7 16
Entering state 25
Reducing stack by rule 10 (line 34):
$1 = nterm num1 ()
$2 = nterm num2 ()
-> $$ = nterm numlist ()
Stack now 0 1 5 12 7
Entering state 15
Next token is token ']' ()
Shifting token ']' ()
Entering state 23
Reducing stack by rule 8 (line 32):
$1 = token '[' ()
$2 = nterm numlist ()
$3 = token ']' ()
-> $$ = nterm list ()
Stack now 0 1 5 12
Entering state 20
Reading a token: Next token is token ')' ()
Shifting token ')' ()
Entering state 28
Reading a token:
However, the expected result (5
) is still not output. Inspecting the action rules we can see that the outputlist
method is only called when a '\n'
token is matched and the trace does not show any '\n'
token. This is because they are removed as whitespace in the lexical analyser. The flex rules need to be changed to fix this:
ap [@()\[\],\n]
ws[ \t]+
and rebuild again. It now works, outputting the correct value:
Starting parse
Entering state 0
Reducing stack by rule 1 (line 25):
-> $$ = nterm input ()
Stack now 0
Entering state 1
Reading a token: head([5,2,4])
Next token is token HEAD ()
Shifting token HEAD ()
Entering state 5
Reading a token: Next token is token '(' ()
Shifting token '(' ()
Entering state 12
Reading a token: Next token is token '[' ()
Shifting token '[' ()
Entering state 7
Reading a token: Next token is token N ()
Shifting token N ()
Entering state 3
Reducing stack by rule 14 (line 38):
$1 = token N ()
-> $$ = nterm num1 ()
Stack now 0 1 5 12 7
Entering state 16
Reading a token: Next token is token ',' ()
Shifting token ',' ()
Entering state 24
Reading a token: Next token is token N ()
Shifting token N ()
Entering state 3
Reducing stack by rule 14 (line 38):
$1 = token N ()
-> $$ = nterm num1 ()
Stack now 0 1 5 12 7 16 24
Entering state 31
Reading a token: Next token is token ',' ()
Shifting token ',' ()
Entering state 24
Reading a token: Next token is token N ()
Shifting token N ()
Entering state 3
Reducing stack by rule 14 (line 38):
$1 = token N ()
-> $$ = nterm num1 ()
Stack now 0 1 5 12 7 16 24 31 24
Entering state 31
Reading a token: Next token is token ']' ()
Reducing stack by rule 11 (line 35):
-> $$ = nterm num2 ()
Stack now 0 1 5 12 7 16 24 31 24 31
Entering state 34
Reducing stack by rule 12 (line 36):
$1 = token ',' ()
$2 = nterm num1 ()
$3 = nterm num2 ()
-> $$ = nterm num2 ()
Stack now 0 1 5 12 7 16 24 31
Entering state 34
Reducing stack by rule 12 (line 36):
$1 = token ',' ()
$2 = nterm num1 ()
$3 = nterm num2 ()
-> $$ = nterm num2 ()
Stack now 0 1 5 12 7 16
Entering state 25
Reducing stack by rule 10 (line 34):
$1 = nterm num1 ()
$2 = nterm num2 ()
-> $$ = nterm numlist ()
Stack now 0 1 5 12 7
Entering state 15
Next token is token ']' ()
Shifting token ']' ()
Entering state 23
Reducing stack by rule 8 (line 32):
$1 = token '[' ()
$2 = nterm numlist ()
$3 = token ']' ()
-> $$ = nterm list ()
Stack now 0 1 5 12
Entering state 20
Reading a token: Next token is token ')' ()
Shifting token ')' ()
Entering state 28
Reading a token: Next token is token '\n' ()
Shifting token '\n' ()
Entering state 32
Reducing stack by rule 4 (line 28):
$1 = token HEAD ()
$2 = token '(' ()
$3 = nterm list ()
$4 = token ')' ()
$5 = token '\n' ()
5-> $$ = nterm exp ()
Stack now 0 1
Entering state 8
Reducing stack by rule 2 (line 26):
$1 = nterm input ()
$2 = nterm exp ()
-> $$ = nterm input ()
Stack now 0
Entering state 1
Reading a token: ^Z
Now at end of input.
Shifting token $end ()
Entering state 2
Stack now 0 1 2
Cleanup: popping token $end ()
Cleanup: popping nterm input ()
Actually, although the debugging got the program working, its still an awful way to do it. Its not using the lexical analyser to build the tokens and the parser to match the grammar. All the number recognition should really be done in the flex code. For example, why not just have a lexer like this:
%option noyywrap
%{
#include <stdlib.h>
#include <list>
using namespace std;
#define YYSTYPE list<int>
#include "listop.tab.h"
%}
hd head
tl tail
ap [@()\[\],\n]
con ::
n [0-9]
d2 [1-9]
num ({d2}+{n}?)|{n}
ws[ \t]+
%%
{num} return NUM; yylval.push_front(atoi(yytext));
{hd} return HEAD;
{tl} return TAIL;
{ap} return *yytext;
{con} return CON;
{ws} /* eat up white spaces */
%%
Then you can simplify the grammar in the parser somewhat and make an overall simpler implementation.
I'll leave that part as an exercise.
Upvotes: 18