Sir Bohumil
Sir Bohumil

Reputation: 241

Should tokenizer return language keywords?

I am writing a toy-compiler for a toy-language, let's suppose it has JavaScript syntax.

Let's say that the source file is:

var val = 123;

My simple compiler will consist of a Tokenizer and a Parser (for now).

Should the Tokenizer return entire language keywords, e.g. var or letter by letter (v, a, r) ?

Sooner or later I will have to recognize keywords, literals etc. and I wonder where is the place for this kind of work?

Upvotes: 2

Views: 157

Answers (2)

olydis
olydis

Reputation: 3310

The tokenizer should usually already return entire keywords (= tokens).

There is no disadvantage of doing so: As soon as your tokenizer decides that it is a language keyword (and not a number or similar), why should you "weaken" this information by splitting something you already successfully detected up in parts ;)

So more generally: don't hesitate to let the tokenizer output as large building-blocks as possible - as long as you do not give them any more meaning, which should be left to the parser.

Upvotes: 3

paxdiablo
paxdiablo

Reputation: 881423

The whole point of a tokeniser is to take your input stream (of characters) and give you tokens that you can use for grammatical analysis.

Hence you would expect the tokeniser to give you something along the lines of:

T_KEYWORD_VAR
T_VARIABLE(val)
T_KEYWORD_EQUALS
T_INTEGER(123)
T_KEYWORD_SEMICOLON

Upvotes: 4

Related Questions