Reputation: 321
I'm trying to make a regular expression that will only work when a valid identifier name is given, using flex (the name cannot start with a number). I'm using this code :
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
%}
%%
"if" { printf("IF "); }
[a-zA-Z_][a-zA-Z_0-9]* { printf("%s ", yytext); }
%%
int main() {
yylex();
}
but it is not working. how to make sure that flex accepts only a valid identifier?
When I provide the input:
if
abc
9abc
I see the following output:
IF
abc
9abc
but I expected:
IF
abc
(nothing)
Upvotes: 2
Views: 4128
Reputation: 241791
Your patterns do not match all possible inputs.
In such cases, (f)lex adds a default catch-all rule, of the form
.|\n { ECHO; }
In other words, any character not recognized by your patterns will simply be printed on stdout
. That will be the case with the newline characters in your input, as well as with the digit 9. After the 9 is recognized by the default rule, the remaining input will again be recognized by your identifier rule.
So you probably wanted something like this:
%option warn nodefault
%%
[[:space:]]+ ; /* Ignore whitespace */
"if" { /* TODO: Handle an "if" token */ }
[[:alpha:]_][[:alnum:]_]* { /* TODO: Handle an identifier token */ }
. { /* TODO: Handle an error */ }
Instead of printing information to stdout
in an action as a debugging or learning aid, I strongly suggest you use the -T
(or --trace
) option when you are building your scanner. That will automatically output debugging information in a consistent and complete manner; it would have told you that the default rule was being matched, for example.
%option nodefault
tells flex not to insert a default rule. I recommend always using it, because it will keep you out of trouble. The warn
option ensures that a warning is issued in this case; I think that warn
is default flex behaviour but the manual suggests using it and it cannot hurt.
It's good style to use standard character class expressions. Inside a character class ([
…]
), [:xxx:]
matches anything for which the standard library function isxxx
would return true. So [[:space:]]+
matches one or more whitespace characters, including space, tab, and newline (and some others), [[:alpha:]_]
matches any letter or an underscore, and [[:alnum:]_]*
matches any number (including 0) of letters, digits, or underscores. See the Patterns section of the manual.
Upvotes: 3