ryeguy
ryeguy

Reputation: 66851

Is this the job of the lexer?

Let's say I was lexing a ruby method definition:

def print_greeting(greeting = "hi")  
end

Is it the lexer's job to maintain state and emit relevant tokens, or should it be relatively dumb? Notice in the above example the greeting param has a default value of "hi". In a different context, greeting = "hi" is variable assignment which sets greeting to "hi". Should the lexer emit generic tokens such as IDENTIFIER EQUALS STRING, or should it be context-aware and emit something like PARAM_NAME EQUALS STRING?

Upvotes: 4

Views: 540

Answers (4)

Gene Bushuyev
Gene Bushuyev

Reputation: 5538

Distinction between lexical analysis and parsing is an arbitrary one. In many cases you wouldn't want a separate step at all. That said, since the performance is usually the most important issue (otherwise parsing would be mostly trivial task) then you need to decide, and probably measure, whether additional processing during lexical analysis is justified or not. There is no general answer.

Upvotes: 2

umlcat
umlcat

Reputation: 4143

Don't work with ruby, but do work with compiler & programming language design.

Both approches work, but in real life, using generic identifiers for variables, parameters and reserved words, is more easier ("dumb lexer" or "dumb scanner").

Later, you can "cast" those generic identifiers into other tokens. Sometimes in your parser.

Sometimes, lexer / scanners have a code section, not the parser , that allow to do several "semantic" operations, incduing casting a generic identifier into a keyword, variable, type identifier, whatever. Your lexer rules detects an generic identifier token, but, returns another token to the parser.

Another similar, common case, is when you have an expression or language that uses "+" and "-" for binary operator and for unary sign operator.

Upvotes: 1

Zuljin
Zuljin

Reputation: 2640

I think that lexer should be "dumb" and in your case should return something like this: DEF IDENTIFIER OPEN_PARENTHESIS IDENTIFIER EQUALS STRING CLOSE_PARENTHESIS END. Parser should do validation - why split responsibilities.

Upvotes: 3

Jasper Bekkers
Jasper Bekkers

Reputation: 6809

I tend to make the lexer as stupid as I possibly can and would thus have it emit the IDENTIFIER EQUALS STRING tokens. At lexical analysis time there is (most of the time..) no information available about what the tokens should represent. Having grammar rules like this in the lexer only polutes it with (very) complex syntax rules. And that's the part of the parser.

Upvotes: 5

Related Questions