unj2
unj2

Reputation: 53551

What are the different token types in C++ compilation?

Walter Bright's article on C++ Compilation talks about these two phrases

"Conversion to preprocessing tokens."
What is the initial token? What does a preprocessing token look like?

"Conversion of preprocessing tokens to C++ tokens" What is this C++ Token and why wasn't it converted into it at first?

Reference: http://www.drdobbs.com/blogs/cpp/228701711

Upvotes: 3

Views: 2103

Answers (2)

umlcat
umlcat

Reputation: 4143

A simplier explanation.

And, you may know, many compilers, have a lexical analysis process, where the source code is split in tokens.

This source code:

void main()
{
  int x = -3 - -5;
  printf("Hello World");
} // void main()

Is transform into something similar to this:

+--------------+------------------+ 
|  TOKEN       |  TEXT            | 
+--------------+------------------+ 
|  void        | "void"           | 
+--------------+------------------+ 
|  identifier  | "main"           | 
+--------------+------------------+ 
|  leftcurly   | "{"              | 
+--------------+------------------+ 
|  identifier  | "int"            | 
+--------------+------------------+ 
|  identifier  | "x"              | 
+--------------+------------------+ 
|  assign      | "="              | 
+--------------+------------------+ 
|  minus       | "-"              | 
+--------------+------------------+ 
|  integer     | "3"              | 
+--------------+------------------+ 
|  minus       | "-"              | 
+--------------+------------------+ 
|  minus       | "-"              | 
+--------------+------------------+ 
|  integer     | "5"              | 
+--------------+------------------+ 
|  semicolon   | ";"              | 
+--------------+------------------+ 
|  identifier  | "printf"         | 
+--------------+------------------+ 
|  leftpar     | "("              | 
+--------------+------------------+ 
|  string      | "Hello World"    | 
+--------------+------------------+ 
|  rightpar    | ")"              | 
+--------------+------------------+ 
|  semicolon   | ";"              | 
+--------------+------------------+ 
|  rightcurly  | "}"              | 
+--------------+------------------+ 
|  comment     | "// void main()" | 
+--------------+------------------+ 

Each of this pieces of text called "tokens", have a meaning.


Sometimes, in other parts of the compilation process, the tokens may be replaced, by anothers tokens:

+--------------+------------------+ 
|  TOKEN       |  TEXT            | 
+--------------+------------------+ 
|  void        | "void"           | 
+--------------+------------------+ 
|  functiondec | "main"           | 
+--------------+------------------+ 
|  leftcurly   | "{"              | 
+--------------+------------------+ 
|  type        | "int"            | 
+--------------+------------------+ 
|  variabledec | "x"              | 
+--------------+------------------+ 
|  assign      | "="              | 
+--------------+------------------+ 
|  negative    | "-"              | 
+--------------+------------------+ 
|  integer     | "3"              | 
+--------------+------------------+ 
|  substract   | "-"              | 
+--------------+------------------+ 
|  negative    | "-"              | 
+--------------+------------------+ 
|  integer     | "5"              | 
+--------------+------------------+ 
|  semicolon   | ";"              | 
+--------------+------------------+ 
| functioncall | "printf"         | 
+--------------+------------------+ 
|  leftpar     | "("              | 
+--------------+------------------+ 
|  string      | "Hello World"    | 
+--------------+------------------+ 
|  rightpar    | ")"              | 
+--------------+------------------+ 
|  semicolon   | ";"              | 
+--------------+------------------+ 
|  rightcurly  | "}"              | 
+--------------+------------------+ 
|  comment     | "// void main()" | 
+--------------+------------------+ 

The conversion from "minus" token, to, either "negative sign token", & "substraction token", is a very good example of this "preprocess token" to "final token".

This is a very conceptual explanation. You may want to read a more detailed technical information on your specific compiler documentation.

Cheers

Upvotes: 2

Oliver Charlesworth
Oliver Charlesworth

Reputation: 272762

A preprocessing token is an element of the grammar of the preprocessor. From [lex.pptoken] in the C++ standard:

preprocessing-token:

  • header-name
  • identifier
  • pp-number
  • character-literal
  • user-defined-character-literal
  • string-literal
  • user-defined-string-literal
  • preprocessing-op-or-punc
  • each non-white-space character that cannot be one of the above

...

A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6.

So the "conversion to preprocessing tokens" is the process of lexing the translation unit and identifying individual tokens.

C++ tokens (really just "tokens") are listed in [lex.token]:

token:

  • identifier
  • keyword
  • literal
  • operator
  • punctuator

These only exist after all the other translation phases have occurred (macro expansion and so on).

For more information on the entire process, I suggest reading [lex.phases] in the C++ standard.

Upvotes: 4

Related Questions