Reputation: 11
I learned about flex and bison in school over the summer, and now I want to dive a little deeper. I'm having trouble understanding the documentation for Bison 3.0.2. Maybe some of you could help me out. I want to parse a string representing an equation, and at the same time, fill out a data structure which contains information about what was parsed. For instance, say I have (ax+b)^2. I want the parser to generate a structure containing a string and an integer constant that looks like the following.
( BEGINGROUP
a VARIABLE
x VARIABLE
+ ADDITION
b VARIABLE
) ENDGROUP
I have created a language specification using flex, and I have created a grammar using bison. All that's needed is to get the two to put information into the structures. I have some code which works in kind of the way I want it to, but I can't help but think I'm missing something. In the Bison documentation examples, I see them using $$ or $1 which they say is to check semantic values? When I print the semantic values, I always get zero. Anyway, my code is posted below.
math.l
%{
#include "equation.h"
Equation* equ;
void setEquation(Equation* equation) {
equ = equation;
}
%}
/* Definitions */
space [ \t]
digit [0-9]
letter [a-zA-Z]
number ({digit}+|{digit}+"."{digit}+|"."{digit}+|{digit}+".")
variable {letter}
/* actions */
%%
{space} ;
{number} equ->addElement(yytext, Equation::number); return(1);
{variable} equ->addElement(yytext, Equation::variable); return(2);
"+" equ->addElement(yytext, Equation::addition); return(10); /* Basic operators */
"-" return(11);
"*" return(12);
"/" return(13);
"^" return(14);
"log" return(15);
"sin" return(20); /* Trigonometric Functions */
"cos" return(21);
"tan" return(22);
"csc" return(23);
"sec" return(24);
"cot" return(25);
"arcsin" return(26);
"arccos" return(27);
"arctan" return(28);
"(" equ->addElement(yytext, Equation::begGroup); return(30); /* Grouping Operators */
")" equ->addElement(yytext, Equation::endGroup); return(31);
"[" return(32);
"]" return(33);
"," return(34);
. fprintf(stderr, "Error on character %s\n", yytext);
math.y
/*
* Implement grammer for equations
*/
%{
#include "lex.yy.c"
#include "equation.h"
#include <iostream>
int yylex(void);
int yyerror(const char *msg);
void output(const char* where) {
std::cout << where << ": " << yytext << std::endl;
}
%}
%token e_num 1
%token e_var 2
%token e_plus 10
%token e_minus 11
%token e_mult 12
%token e_div 13
%token e_pow 14
%token e_log 15
%token e_sin 20
%token e_cos 21
%token e_tan 22
%token e_csc 23
%token e_sec 24
%token e_cot 25
%token e_asin 26
%token e_acos 27
%token e_atan 28
%token lparen 30
%token rparen 31
%token slparen 32
%token srparen 33
%token comma 34
%start Expression
%%
Expression : Term MoreTerms
| e_minus Term MoreTerms
;
MoreTerms : /* add a term */
e_plus Term MoreTerms
| /* subtract a term */
e_minus Term MoreTerms
| /* add a negetive term */
e_plus e_minus Term MoreTerms /* Add a negetive term */
| /* minus a negetive term */
e_minus e_minus Term MoreTerms /* Subtract a negetive term */
| /* no extra terms */
;
Term : Factor MoreFactors {equ->addElement("*", Equation::multiplication)};
;
MoreFactors: e_mult Factor MoreFactors
| e_div Factor MoreFactors
| Factor MoreFactors
|
;
Factor : e_num { std::cout << $1 << std::endl; } //returns zero no matter where I put this
| e_var
| Group
| Function
;
BeginGroup : lparen | slparen;
EndGroup : rparen | srparen;
Group : BeginGroup Expression EndGroup
;
Function : TrigFuncs
| PowerFunc
;
TrigFuncs : e_sin lparen Expression rparen
| e_cos lparen Expression rparen
| e_tan lparen Expression rparen
| e_csc lparen Expression rparen
| e_sec lparen Expression rparen
| e_cot lparen Expression rparen
| e_asin lparen Expression rparen
| e_acos lparen Expression rparen
| e_atan lparen Expression rparen
;
PowerFunc : e_num e_pow Factor
| e_var e_pow Factor
| Group e_pow Factor
| TrigFuncs e_pow Factor
;
I think it's pretty clear what I'm doing. As you can see, the scanner stores yytext into the equation class along with its code, but sometimes the parser must add information to the equation class, and this is where things can get hectic. For one, trying to add code before or in the middle of the statement can lead to massive shift/reduce conflicts. Secondly, the effect of putting the code at the end of statement is to record things out of order. Look at the rule for term. If I type "ax", this implicitly means, "a" times "x" or "a*x". I want the parser to add the multiplication into my structure in, but the parser does this out of order. So is there a better way to accomplish my goal?
Upvotes: 1
Views: 1092
Reputation: 5893
You're probably wondering why no one has answered your question in four months; when it seems like such a simple problem with a calculator. This is because it is not a simple question to answer. Its a portmanteau of a question with lots of hidden nooks and crannies for the unwary. Now there are quite a few bison experts contributing to answers on Stack Overflow who really know their stuff, and all of them have avoided this like a plague. If you'd simplified the problem and asked about just one thing at a time you might have got some answers, but you just pasted in the whole raft of code and dropped in some "Oh, by the way, fix everything else while you're at it!". However, if someone wanted to debug your code they couldn't because you missed out a whole load of critical stuff, like the Equation
object. StackOverflow is not a bunch of free coders you know! Have a read of some of the guidance.
Let's start with something simple; this $$
and $1
stuff which you started with. You called them "semantic values". Not really. They are a mechanism of passing a value back through the parse tree, and also for the lexical analyser to associate a token value to give to the parser. The reason you always get a zero is you never assigned any value to it in the lexer in the first place. The value is assigned by reference to the built in variable yylval
. That's in the documentation. Lets take your number lexeme which returns an e_num
token. I could write this (in math.l
):
{number} {yylval = atoi(yytext);return(e_num);}
This permits us to now print the value of the number:
Factor : e_num { std::cout << $1 << std::endl; }
It looks much better if you use the name of the enumerated constant rather than those absolute values. Pretty poor coding to hard-wire those numbers in like that.
I also see that you've created your stack and push things on it in the lexer. Probably not a good plan, but lets go with that for now. You can associate the token with such an object value if you wanted. The %union
construct (in bison) is used for that:
%union {
Equation* TokenValue;
}
Now you have to starting typing the terminals and non-terminals of your grammar:
%token <TokenValue> e_enum;
But now we can get the value using these structures if you wish:
{number} {equ->addElement(yytext, Equation::number);
yylval.TokenValue = equ;
return(e_num);
}
and in bison:
Factor : e_num { std::cout << $1->ElementText << std::endl; }
In the bison grammar there are places where you might want to pass back several values from the right hand side of the rule to the left hand side. This is where the $$ = $2
form of construct would be used, but I'll leave you to read about that.
The next thing to mention is the way you have described the grammar of arithmetic expressions. There are two problems, mainly due to the handling of negatives and the grammar form.
There is a bit of computer science we need here. You have (mainly) represented the expressions as lists of operators and operands, which is regarded as a Regular Language or Chomsky Type 3 form. The problem is that arithmetic expressions are, in the main, nested in structure. Nesting requires a Context Free Grammar or Chomsky Type 2. This is why all examples of calculator grammars use the form:
exp : exp OP exp
rather than the listing form you have used.
Now you have used rule hierarchies to get some of the priorities of the operators in the grammar, but unfortunately the unary minus (or negation) operator would have a higher priority than the binary subtraction operator and thus should appear in the Factor
rule. These are the reasons why you get so many shift/reduce conflicts. It's just not the right way to do it. There is a reason all the text books do calculators and expressions in a way different to your example. You need to go back to the text books.
This is why people study this stuff at college. I hope that helps some readers who are looking at similar problems in the future.
Upvotes: 2