DevDevDev
DevDevDev

Reputation: 5177

How to get entire input string in Lex and Yacc?

OK, so here is the deal.

In my language I have some commands, say

XYZ 3 5
GGB 8 9
HDH 8783 33

And in my Lex file

XYZ { return XYZ; }
GGB { return GGB; }
HDH { return HDH; }
[0-9]+ { yylval.ival = atoi(yytext); return NUMBER; }
\n  { return EOL; }

In my yacc file

start : commands
    ;

commands : command
         | command EOL commands
    ;

    command : xyz
            | ggb
            | hdh
    ;

    xyz : XYZ NUMBER NUMBER { /* Do something with the numbers */ }
       ;

    etc. etc. etc. etc.

My question is, how can I get the entire text

XYZ 3 5
GGB 8 9
HDH 8783 33

Into commands while still returning the NUMBERs?

Also when my Lex returns a STRING [0-9a-zA-Z]+, and I want to do verification on it's length, should I do it like

rule: STRING STRING { if (strlen($1) < 5 ) /* Do some shit else error */ }

or actually have a token in my Lex that returns different tokens depending on length?

Upvotes: 0

Views: 5645

Answers (3)

Jonathan Leffler
Jonathan Leffler

Reputation: 754920

If you arrange for your lexical analyzer (yylex()) to store the whole string in some variable, then your code can access it. The communication with the parser proper will be through the normal mechanisms, but there's nothing that says you can't also have another variable lurking around (probably a file static variable - but beware multithreading) that stores the whole input line before it is dissected.

Upvotes: 1

qrdl
qrdl

Reputation: 34998

As you use yylval.ival you already have union with ival field in your YACC source, like this:

%union {
    int ival;
}

Now you specify token type, like this:

%token <ival> NUMBER

So now you can access ival field simply for NUMBER token as $1 in your rules, like

xyz : XYZ NUMBER NUMBER { printf("XYZ %d %d", $2, $3); }

For your second question I'd define union like this:

%union {
    char*   strval;
    int     ival;
}

and in you LEX source specify token types

%token <strval> STRING;
%token <ival> NUMBER;

So now you can do things like

foo : STRING NUMBER { printf("%s (len %d) %d", $1, strlen($1), $2); }

Upvotes: 0

AProgrammer
AProgrammer

Reputation: 52334

If I've understood your first question correctly, you can have semantic actions like

{ $$ = makeXYZ($2, $3); }

which will allow you to build the value of command as you want.

For your second question, the borders between lexical analysis and grammatical analysis and between grammatical analysis and semantic analysis aren't hard and well fixed. Moving them is a trade-off between factors like easiness of description, clarity of error messages and robustness in presence of errors. Considering the verification of string length, the likelihood of an error occurring is quite high and the error message if it is handled by returning different terminals for different length will probably be not clear. So if it is possible -- that depend on the grammar -- I'd handle it in the semantic analysis phase, where the message can easily be tailored.

Upvotes: 1

Related Questions