Yyao
Yyao

Reputation: 413

How can I determine the meaning of ambiguous terminator when I tokenize my code?

In python, the word in indicates a operator in an expression 1 in [1,2,3]. But, in statement for i in range(10), it indicates a keyword of 'for' statement. I wrote a lexer based on regular expression. I use the rule (\+|-|\*|/|is|in) to match operator and (for|in|if|elif|else) for keywords. I don't know if I should put in in the rule of operator or keywords. Both of them will lose one meaning. It seems that I should solve this in parsing. But I need give in a label in tokenizing. What should I do?

Upvotes: 0

Views: 62

Answers (1)

rici
rici

Reputation: 241931

Call it "token_in" :) It's usually better not to categorize in your lexer; the parser is responsible for analyzing the syntactic purpose of a token.

In any case, I don't see the point of the lexer producing a single token type for different keywords. if and else are syntactically distinct tokens, and the parser wants to know that it is seeing an if; the fact that it is presented with a "keyword" is not particularly useful to it.

Upvotes: 1

Related Questions