named variables ambiguous with other token identifiers

Question

Im using a lexer that needs to be able to identify the difference between a named variable and a keyword.

to elaborate, in my .l file I have some definitions like

"QUIT" {return QUIT;}
"AND"  {return AND;}
"XOR"  {return XOR;}

and also I have the definition of a name(for a variable)

[a-zA-Z][a-zA-Z0-9]* {memcpy(yylval.name, yytext, strlen(yytext) + 1); return NAME;}

My issue is that my keywords, like QUIT, AND, XOR each satisfy the rule for NAME, leaving me with the issue of ambiguity.

How do I work around this?

rici · Accepted Answer

Put the keywords first.

Flex-generated scanners always select the longest match; if more than one pattern can apply, the first one is selected. So if you have:

QUIT {return QUIT;}
AND  {return AND;}
XOR  {return XOR;}
[a-zA-Z][a-zA-Z0-9]* { memcpy(yylval.name, yytext, yyleng + 1); return NAME; }
[[:space:]]+  /* Ignore */

and your input is:

QUITAND QUIT AND

then your scanner will return three tokens: NAME, QUIT, AND. If you had put the NAME pattern first, then you'd get three NAME tokens. (There is no need to quote alphanumeric characters in a flex pattern. The things which need to be quoted are regex operators.)

The memcpy is not a good idea. I modified it to avoid the redundant call to strlen, that's really the least of the problems. I assume you're combining it with a declaration of yylval.name as a fixed-length character array. If so, you should verify that the NAME is not too long to fit in the space provided. But it's still a bad idea because the bison-generated parser assumes that stacked values are not too big; it does not try to avoid copying them when that makes the generated code more convenient.

named variables ambiguous with other token identifiers

Answers (1)

Related Questions