Reputation: 157
I'm trying to define a simple tokenizer for a language in lex.
Basically , I want to define tokens for parenthesis, comma, comparison ops, in/con/ncon ops, and logical ops. And I want any other token to match the 'keywords' regexp, as that would represent a STRINGARG in my language.
Everytime I try to feed it a string like 'A_FIELD', it gives me a LEXER ERROR. I want it to match 'keywords' and return a STRINGARG token.
Here is my .l file :
%{
#include "y.tab.h"
%}
lparen "("
rparen ")"
comma ","
comparison ("=="|"!="|">"|"<"|">="|"<=")
intok ("in"|"IN")
conncontok ("con"|"CON"|"ncon"|"NCON")
logical ("and"|"or"|"AND"|"OR"|"&"|"|")
keywords ( "(" | ")" | "," | "==" | "!=" | ">" | "<" | ">=" | "<=" | "in" | "IN" | "con" | "CON" | "ncon" | "NCON" | "and" | "AND" | "&" | "or"\
| "OR" | "|" )
%%
" " /* ignore whitespace */
{lparen} { return LPAREN; }
{rparen} { return RPAREN; }
{comma} { return COMMA; }
{comparison} { yylval.str = yytext; return COMPARISON; }
{intok} { return IN; }
{conncontok} { yylval.str = yytext; return CONNCON; }
{logical} { return LOGICAL; }
^keywords { yylval.str = yytext; return STRINGARG; }
. { printf("LEXER ERROR."); exit(1); }
%%
#ifndef yywrap
int yywrap() { return 1; }
#endif
Upvotes: 0
Views: 477
Reputation: 157
I found the answer to this problem.
Basically I wanted a stringarg to be anything other than one of the recognized tokens. So when I set up my lex definition as follows, everything worked out fine. I should have been using character classes, not tokens in the last rule :
%%
" " /* ignore whitespace */
{lparen} { return LPAREN; }
{rparen} { return RPAREN; }
{comma} { return COMMA; }
{comparison} { yylval.str = yytext; return COMPARISON; }
{intok} { return IN; }
{conncontok} { yylval.str = yytext; return CONNCON; }
{logical} { return LOGICAL; }
**[^ \t\n]+ { yylval.str = yytext; return STRINGARG; }**
. { printf( "Lexer error." ); exit(1); }
%%
Upvotes: 1