Reputation: 83

How are tokens managed?

Please explain the below paragraph

Typically with a lexer/parser a token is a structure that holds not only the name of the token, but the characters/symbols that make up the token and the start and end position of the string of characters that make up the token, with the start and end position being used for error reporting, highlighting, etc. a token will more likely hold the start and end position of the characters/symbols that represent the token and the lexeme, sequence of characters/symbols can be derived from the start and end position as needed because the input is static.

I don't understand the start and ending position which the token will hold, please clarify it.

Upvotes: 0

Answers (1)

Martin Törnwall

Reputation: 9599

There are many reasons to attach source code location information to tokens, including:

The parser needs them in order to report syntax errors in a user-friendly way. If the parser didn't have access to source code locations, the user could potentially have to scan through the entire file looking for a syntactic error consistent with what the parser reported, as there's no way for it to tell you where it occurred.
Later stages of the compiler may also need access to source code locations. You might, for instance, want to include debugging information in the binary. If that's the case, other data structures in the compiler will need to be modified to carry and forward this information as well (such as the AST and IR).

You don't necessarily have to store all of it (starting and ending line/column) either. Often storing the line at which a token begins is quite sufficient. However, the more precise the location the better you can make the error reporting. Consider the following code snippet:

int x;
Foo y;

int z = x + y;

Let's say the + operator is not defined for (int, Foo). A compiler that only knows the line number of each token would be limited to reporting an error such as:

<file>:4: Error: Operator + is not defined for (int, Foo).

If we add a column number, we suddenly know the exact location of the + sign. This allows us to report an even better error message:

<file>:4: Error: Operator + is not defined for (int, Foo).
          In statement: int z = x + y;
          ------------------------^

If this simple example doesn't convince you I urge to to compare the errors produced by a recent clang to an old gcc.

Upvotes: 0

How are tokens managed?

Answers (1)

Related Questions