Reputation: 45
Im using a lexer that needs to be able to identify the difference between a named variable and a keyword.
to elaborate, in my .l file I have some definitions like
"QUIT" {return QUIT;}
"AND" {return AND;}
"XOR" {return XOR;}
and also I have the definition of a name(for a variable)
[a-zA-Z][a-zA-Z0-9]* {memcpy(yylval.name, yytext, strlen(yytext) + 1); return NAME;}
My issue is that my keywords, like QUIT, AND, XOR each satisfy the rule for NAME, leaving me with the issue of ambiguity.
How do I work around this?
Upvotes: 0
Views: 62
Reputation: 241731
Put the keywords first.
Flex-generated scanners always select the longest match; if more than one pattern can apply, the first one is selected. So if you have:
QUIT {return QUIT;}
AND {return AND;}
XOR {return XOR;}
[a-zA-Z][a-zA-Z0-9]* { memcpy(yylval.name, yytext, yyleng + 1); return NAME; }
[[:space:]]+ /* Ignore */
and your input is:
QUITAND QUIT AND
then your scanner will return three tokens: NAME
, QUIT
, AND
. If you had put the NAME
pattern first, then you'd get three NAME
tokens. (There is no need to quote alphanumeric characters in a flex pattern. The things which need to be quoted are regex operators.)
The memcpy
is not a good idea. I modified it to avoid the redundant call to strlen
, that's really the least of the problems. I assume you're combining it with a declaration of yylval.name
as a fixed-length character array. If so, you should verify that the NAME is not too long to fit in the space provided. But it's still a bad idea because the bison-generated parser assumes that stacked values are not too big; it does not try to avoid copying them when that makes the generated code more convenient.
Upvotes: 2