EmadSmart
EmadSmart

Reputation: 81

How to check the lex input as a single input

i have make a lex file as it shown below:

%%
[\t\n]
"if" {printf("IF_TOKEN\n");}
"else" {printf("ELSE_TOKEN\n");}
"while" {printf("WHILE_TOKEN\n");}
"FOR" {printf("FOR_TOKEN\n");}
"BREAK" {printf("BREAK_TOKEN\n");}
"float" {printf("FLOAT_TOKEN\n");}
"int" {printf("INT_TOKEN\n");}
"long" {printf("LONG_TOKEN\n");}
"return" {printf("RETURN_TOKEN\n");}
"defFunction" {printf("DEFFUNCTION_TOKEN\n");}
"defClass" {printf("DEFCLASS_TOKEN\n");}
"\(" {printf("PAROPEN_TOKEN\n");}
"\)" {printf("PARCLOS_TOKEN\n");}
"\{" {printf("CBROPEN_TOKEN\n");}
"\}" {printf("CBRCLOS_TOKEN\n");}
"<" {printf("LESSTHN_TOKEN\n");}
">" {printf("GRTRTHN_TOKEN\n");}
"=" {printf("EQUALTO_TOKEN\n");}
"!=" {printf("NEQUALTO_TOKEN\n");}
"\+" {printf("SUM_TOKEN\n");}
"-" {printf("MINUS_TOKEN\n");}
"\*" {printf("STAR_TOKEN\n");}
"\/" {printf("SLASH_TOKEN\n");}
"%" {printf("REMAIN_TOKEN\n");}
"\[" {printf("BRAOPEN_TOKEN\n");}
"\]" {printf("BRACLOS_TOKEN\n");}
";" {printf("SEMICOL_TOKEN\n");}
[-]?[1-9][0-9]* {printf("NUMBER\n");}
[A-Za-z&_$][A-Za-z$_]* {printf("ID\n");}
. {printf("ERROR");}

%%
int yywrap (void) {
return 1;
}
int main (int argc, char** argv) {
   yylex();
   return 0;
}

if i give 125apple as an input to this lex file after compile the .l file, it should print error but it print NUMBER ID how can i give 125apple as a single input?

Upvotes: 0

Views: 152

Answers (1)

rici
rici

Reputation: 241721

In many languages, that's exactly how 125apple would be lexed, in part because that's the way a naive lex scanner definition works.

If you want it to be an error, you need to explicitly make it an error, by adding a pattern which will match erroneous tokens. By putting it after the pattern which matches valid numbers, you avoid triggering an error on inputs which match both patterns, so the error pattern can also match valid tokens. That makes it a bit easier to write.

0|[-]?[1-9][0-9]* {printf("NUMBER\n");}
[-]?[0-9]+[0-9A-Za-z_]* {printf("ERROR\n");}
[A-Za-z&_$][A-Za-z$_]* {printf("ID\n");}

Above, I made a little change: your number pattern does not recognize 0, so I added it.

The error line not only catches 125apple. It also catches other erroneous tokens, like 0037 and -0. (I'm not convinced that -0 should be an error; you might want to fix that.) It does not treat 123$apple as an error, so you might want to change that, too.

Upvotes: 2

Related Questions