Reputation: 1565
I have tried to sit here and read tutorials on how YACC works with the lex file however I'm not sure I can wrap my head around it. I understand it is for reading the actual input file and determining if functions such as add or subtract are in the right format, but how does this work? I understand how Lex files work and have constructed one that returns defined variables based on what it comes across in the file.
If I have a program that is very simple, how would I go about analyzing the first lines of this test programming language. Let's assume "program" is a defined value in lex, as well as "is", "var", "begin", "+", "print", ";", ",", and "end".
How would the yacc file need to be written in order for the first few lines to be read?
Test file:
program xyz is
var a, b, c
begin
a = 2;
b = 3;
c = a + b;
print c
end
Yaccer.y
%token EOFNUM 0
%token SEMINUM 1
%token LPARENNUM 2
%token RPARENNUM 3
%token ICONSTNUM 4
%token BEGINNUM 5
%token PROGRAMNUM 6
%token MINUSNUM 7
%token TIMESNUM 8
%token VARNUM 9
%token COMMANUM 10
%token IDNUM 11
%token ENDNUM 12
%token ISNUM 13
%token PLUSNUM 14
%token DIVNUM 15
%token PRINTNUM 16
%token EQUALNUM 17
%left '+' '-'
%left '*' '/'
%%
%%
#include "lex.yy.c"
#include <stdio.h>
yyerror(str)
char *str;
{ printf("yyerror: %s at line %d\n", str, yyline); }
main () {
if (!yyparse()) { printf("accept\n");}
else { printf("reject\n"); }
}
I know there needs to be some %types defined for the equations, however I am not sure how to go about using the types along with the %left declarations, or even if this is correct. I am also confused as to how I analyze the first line.
Upvotes: 1
Views: 1854
Reputation: 5744
You would need to create both a lex file and a yacc file. In the lex file you define the individual tokens for every type of token than your grammar contain. Then you construct the grammar in BNF form suitable for yacc. For your simple input it looks a bit like this. I assume that it's a typo that the print
statement doesn't have a ;
after it, and this grammar requires a semicolon.
%token IDENTIFIER
%token PROGRAM
%token BEGIN
%token END
%token IS
%token VAR
%token PRINT
%token NUMBER
program
statementlist
statement
printstatement
assignstatement
expression
%%
program : PROGRAM IDENTIFIER IS VAR variables BEGIN statementlist END;
variables : IDENTIFIER | variables ',' IDENTIFIER;
statementlist : statement | statementlist statement;
statement : assignstatement | printstatement;
printstatement : PRINT IDENTIFIER ';';
assignstatement : IDENTIFIER '=' expression ';';
expression : value | expression '+' value;
value : NUMBER | IDENTIFIER;
%%
The %left
and %right
are associativity modifiers, and aren't really needed in your case. They would be needed if you were to support -
using the same precedence as +
. Well, technically not really needed but they make the grammar easier to understand.
Yacc isn't easy to understand, and I recommend a tutorial on the subject. Rules very quickly become recursive in nature and you have to have a certain mindset when working with it. It's easier to build your grammar incrementally, starting with the highest order statement. Make a parser that just recognizes program begin end and then work in the features as you go.
A good tool to test and verify grammars is the javascript parser generator found here
Upvotes: 1