Reputation: 17157
I'm interested in using lex
to tokenize my input string, but I do not want it to be possible to "fail". Instead, I want to have some type of DEFAULT
or TEXT
token, which would contain all the non-matching characters between recognized tokens.
Anyone have experience with something like this?
Upvotes: 1
Views: 115
Reputation: 310850
To expand on @Chris Dodd's answer, the final rule in any lex script should be:
. return yytext[0];
and don't write any single-character rules like "+" return PLUS;
. Just use the special characters you recognize directly in the grammar, e.g. term: term '+' factor;
.
This practice:
Upvotes: 1
Reputation: 126175
Use the pattern .
at the end of all your lex rules to match any character that isn't matched by any other rule. You may also need a \n
rule to match newlines (a newline is the only character the .
doesn't match)
If you want to combine adjacent non-matching characters into a single token, that is harder, and is more easily done in the parser.
Upvotes: 1