Luis Masuelli
Luis Masuelli

Reputation: 12333

Why is lex trying to match the whole line instead of just a token?

I have this lex file:

COMMENT \#.*\n
SPACE [\x20\n\r\t]
L [a-zA-Z_]
D [0-9]

%%

{COMMENT}                      |
{SPACE}+                       ;
{L}({L}|{D})*                  { printf("identifier token: %s\n", yytext); return 1; }
-?{D}*                         { printf("int number token: %s\n", yytext); return 1; }
.*                             { printf("invalid token: %s\n", yytext); return -1; }

%%

#include <stdio.h>

int yywrap() {
    return 1;
}

int main() {
    while(yylex() > 0) {};
    return 0;
}

And I have, say, two files.

Case 1:

#comentario de prueba
   print nestor

Case 2:

#comentario de mierda
print

Using such lex definition, I get an error: "invalid token: print nestor" for the first case, while the second case returns with no error.

What am I doing wrong? The intention here is that the first case produce tokens: (spaces)(identifier)(spaces)(identifier)

Upvotes: 0

Views: 336

Answers (1)

Kevin
Kevin

Reputation: 56059

Lex takes the longest match first. In this case, that's going to be

.*                        { printf("invalid token: %s\n", yytext); return -1; }

Because .* matches the entire line. Take out the *, just . should work.

Upvotes: 2

Related Questions