Rocky Johnson
Rocky Johnson

Reputation: 321

Valid regular expression for identifier using flex

I'm trying to make a regular expression that will only work when a valid identifier name is given, using flex (the name cannot start with a number). I'm using this code :

%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

%}

%%
"if"                        { printf("IF "); }
[a-zA-Z_][a-zA-Z_0-9]*      { printf("%s ", yytext); }

%%

int main() {
    yylex();
} 

but it is not working. how to make sure that flex accepts only a valid identifier?

When I provide the input:

if
abc
9abc

I see the following output:

IF
abc
9abc

but I expected:

IF
abc
(nothing)

Upvotes: 2

Views: 4128

Answers (1)

rici
rici

Reputation: 241791

Your patterns do not match all possible inputs.

In such cases, (f)lex adds a default catch-all rule, of the form

.|\n   { ECHO; }

In other words, any character not recognized by your patterns will simply be printed on stdout. That will be the case with the newline characters in your input, as well as with the digit 9. After the 9 is recognized by the default rule, the remaining input will again be recognized by your identifier rule.

So you probably wanted something like this:

%option warn nodefault
%%
[[:space:]]+                ; /* Ignore whitespace */
"if"                        { /* TODO: Handle an "if" token */ }
[[:alpha:]_][[:alnum:]_]*   { /* TODO: Handle an identifier token */ }
.                           { /* TODO: Handle an error */ }

Instead of printing information to stdout in an action as a debugging or learning aid, I strongly suggest you use the -T (or --trace) option when you are building your scanner. That will automatically output debugging information in a consistent and complete manner; it would have told you that the default rule was being matched, for example.

Notes:

  1. %option nodefault tells flex not to insert a default rule. I recommend always using it, because it will keep you out of trouble. The warn option ensures that a warning is issued in this case; I think that warn is default flex behaviour but the manual suggests using it and it cannot hurt.

  2. It's good style to use standard character class expressions. Inside a character class ([]), [:xxx:] matches anything for which the standard library function isxxx would return true. So [[:space:]]+ matches one or more whitespace characters, including space, tab, and newline (and some others), [[:alpha:]_] matches any letter or an underscore, and [[:alnum:]_]* matches any number (including 0) of letters, digits, or underscores. See the Patterns section of the manual.

Upvotes: 3

Related Questions