Reputation: 53551
Walter Bright's article on C++ Compilation talks about these two phrases
"Conversion to preprocessing tokens."
What is the initial token? What does a preprocessing token look like?
"Conversion of preprocessing tokens to C++ tokens" What is this C++ Token and why wasn't it converted into it at first?
Reference: http://www.drdobbs.com/blogs/cpp/228701711
Upvotes: 3
Views: 2103
Reputation: 4143
A simplier explanation.
And, you may know, many compilers, have a lexical analysis process, where the source code is split in tokens.
This source code:
void main()
{
int x = -3 - -5;
printf("Hello World");
} // void main()
Is transform into something similar to this:
+--------------+------------------+ | TOKEN | TEXT | +--------------+------------------+ | void | "void" | +--------------+------------------+ | identifier | "main" | +--------------+------------------+ | leftcurly | "{" | +--------------+------------------+ | identifier | "int" | +--------------+------------------+ | identifier | "x" | +--------------+------------------+ | assign | "=" | +--------------+------------------+ | minus | "-" | +--------------+------------------+ | integer | "3" | +--------------+------------------+ | minus | "-" | +--------------+------------------+ | minus | "-" | +--------------+------------------+ | integer | "5" | +--------------+------------------+ | semicolon | ";" | +--------------+------------------+ | identifier | "printf" | +--------------+------------------+ | leftpar | "(" | +--------------+------------------+ | string | "Hello World" | +--------------+------------------+ | rightpar | ")" | +--------------+------------------+ | semicolon | ";" | +--------------+------------------+ | rightcurly | "}" | +--------------+------------------+ | comment | "// void main()" | +--------------+------------------+
Each of this pieces of text called "tokens", have a meaning.
Sometimes, in other parts of the compilation process, the tokens may be replaced, by anothers tokens:
+--------------+------------------+ | TOKEN | TEXT | +--------------+------------------+ | void | "void" | +--------------+------------------+ | functiondec | "main" | +--------------+------------------+ | leftcurly | "{" | +--------------+------------------+ | type | "int" | +--------------+------------------+ | variabledec | "x" | +--------------+------------------+ | assign | "=" | +--------------+------------------+ | negative | "-" | +--------------+------------------+ | integer | "3" | +--------------+------------------+ | substract | "-" | +--------------+------------------+ | negative | "-" | +--------------+------------------+ | integer | "5" | +--------------+------------------+ | semicolon | ";" | +--------------+------------------+ | functioncall | "printf" | +--------------+------------------+ | leftpar | "(" | +--------------+------------------+ | string | "Hello World" | +--------------+------------------+ | rightpar | ")" | +--------------+------------------+ | semicolon | ";" | +--------------+------------------+ | rightcurly | "}" | +--------------+------------------+ | comment | "// void main()" | +--------------+------------------+
The conversion from "minus" token, to, either "negative sign token", & "substraction token", is a very good example of this "preprocess token" to "final token".
This is a very conceptual explanation. You may want to read a more detailed technical information on your specific compiler documentation.
Cheers
Upvotes: 2
Reputation: 272762
A preprocessing token is an element of the grammar of the preprocessor. From [lex.pptoken] in the C++ standard:
preprocessing-token:
- header-name
- identifier
- pp-number
- character-literal
- user-defined-character-literal
- string-literal
- user-defined-string-literal
- preprocessing-op-or-punc
- each non-white-space character that cannot be one of the above
...
A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6.
So the "conversion to preprocessing tokens" is the process of lexing the translation unit and identifying individual tokens.
C++ tokens (really just "tokens") are listed in [lex.token]:
token:
- identifier
- keyword
- literal
- operator
- punctuator
These only exist after all the other translation phases have occurred (macro expansion and so on).
For more information on the entire process, I suggest reading [lex.phases] in the C++ standard.
Upvotes: 4