Guido Muscioni
Guido Muscioni

Reputation: 1295

Flex does not recognize identifiers

I am trying to implement a very simple parser using flex. I am currently stuck in the ID recognition. That is my code:

ID [a−zA−Z_][a−zA−Z0−9_]*
...
{ID} { printf( "An identifier: %s\n", yytext ); return TOK_ID;}

However what I get is only the first letter of the identifier, for example if I try to parse:

int _underscore ;

The result is:

An identifier: _

Any advice?

EDIT:

With a more accurate analysis I have figured out that the code is able to recognize only the id with a,z,A,Z,_, that are the explicit characters in the regular expression. I did not find anything like that online, is that a bug?

EDIT2:

If I modify the code in that way all work

ID [a−zA−Z_][a−zA−Z0−9_]*
...
[a−zA−Z_][a−zA−Z0−9_]* { printf( "An identifier: %s\n", yytext ); return TOK_ID;}

According to the documentation it should work also in the other way.

Upvotes: 0

Views: 834

Answers (1)

K. A. Buhr
K. A. Buhr

Reputation: 50819

This is a character encoding issue. In your copy-and-pasted source code, the things that look like ASCII hyphens (-, code U+2D) in your definition of ID:

ID [a−zA−Z_][a−zA−Z0−9_]*

aren't. Instead they're unicode minus signs (, U+2212). If you replace the incorrect minus signs with the correct hyphens, the line will look like:

ID [a-zA-Z_][a-zA-Z0-9_]*

Depending on your font, if you look very closely, you may see a difference between the in the first version and the - in the second.

Anyway, replace your ID definition with the second version above (or else retype it from scratch, and all should be well.

Upvotes: 3

Related Questions