Reputation: 1295
I am trying to implement a very simple parser using flex. I am currently stuck in the ID recognition. That is my code:
ID [a−zA−Z_][a−zA−Z0−9_]*
...
{ID} { printf( "An identifier: %s\n", yytext ); return TOK_ID;}
However what I get is only the first letter of the identifier, for example if I try to parse:
int _underscore ;
The result is:
An identifier: _
Any advice?
EDIT:
With a more accurate analysis I have figured out that the code is able to recognize only the id with a,z,A,Z,_, that are the explicit characters in the regular expression. I did not find anything like that online, is that a bug?
EDIT2:
If I modify the code in that way all work
ID [a−zA−Z_][a−zA−Z0−9_]*
...
[a−zA−Z_][a−zA−Z0−9_]* { printf( "An identifier: %s\n", yytext ); return TOK_ID;}
According to the documentation it should work also in the other way.
Upvotes: 0
Views: 834
Reputation: 50819
This is a character encoding issue. In your copy-and-pasted source code, the things that look like ASCII hyphens (-
, code U+2D) in your definition of ID:
ID [a−zA−Z_][a−zA−Z0−9_]*
aren't. Instead they're unicode minus signs (−
, U+2212). If you replace the incorrect minus signs with the correct hyphens, the line will look like:
ID [a-zA-Z_][a-zA-Z0-9_]*
Depending on your font, if you look very closely, you may see a difference between the −
in the first version and the -
in the second.
Anyway, replace your ID
definition with the second version above (or else retype it from scratch, and all should be well.
Upvotes: 3