Reputation: 5177
OK, so here is the deal.
In my language I have some commands, say
XYZ 3 5
GGB 8 9
HDH 8783 33
And in my Lex file
XYZ { return XYZ; }
GGB { return GGB; }
HDH { return HDH; }
[0-9]+ { yylval.ival = atoi(yytext); return NUMBER; }
\n { return EOL; }
In my yacc file
start : commands
;
commands : command
| command EOL commands
;
command : xyz
| ggb
| hdh
;
xyz : XYZ NUMBER NUMBER { /* Do something with the numbers */ }
;
etc. etc. etc. etc.
My question is, how can I get the entire text
XYZ 3 5
GGB 8 9
HDH 8783 33
Into commands while still returning the NUMBERs?
Also when my Lex returns a STRING [0-9a-zA-Z]+, and I want to do verification on it's length, should I do it like
rule: STRING STRING { if (strlen($1) < 5 ) /* Do some shit else error */ }
or actually have a token in my Lex that returns different tokens depending on length?
Upvotes: 0
Views: 5645
Reputation: 754920
If you arrange for your lexical analyzer (yylex()
) to store the whole string in some variable, then your code can access it. The communication with the parser proper will be through the normal mechanisms, but there's nothing that says you can't also have another variable lurking around (probably a file static variable - but beware multithreading) that stores the whole input line before it is dissected.
Upvotes: 1
Reputation: 34998
As you use yylval.ival
you already have union
with ival
field in your YACC source, like this:
%union {
int ival;
}
Now you specify token type, like this:
%token <ival> NUMBER
So now you can access ival
field simply for NUMBER token as $1
in your rules, like
xyz : XYZ NUMBER NUMBER { printf("XYZ %d %d", $2, $3); }
For your second question I'd define union like this:
%union {
char* strval;
int ival;
}
and in you LEX source specify token types
%token <strval> STRING;
%token <ival> NUMBER;
So now you can do things like
foo : STRING NUMBER { printf("%s (len %d) %d", $1, strlen($1), $2); }
Upvotes: 0
Reputation: 52334
If I've understood your first question correctly, you can have semantic actions like
{ $$ = makeXYZ($2, $3); }
which will allow you to build the value of command as you want.
For your second question, the borders between lexical analysis and grammatical analysis and between grammatical analysis and semantic analysis aren't hard and well fixed. Moving them is a trade-off between factors like easiness of description, clarity of error messages and robustness in presence of errors. Considering the verification of string length, the likelihood of an error occurring is quite high and the error message if it is handled by returning different terminals for different length will probably be not clear. So if it is possible -- that depend on the grammar -- I'd handle it in the semantic analysis phase, where the message can easily be tailored.
Upvotes: 1