C++ token types

Question

I am assuming that C++ token types (as per 2.7 Tokens [lex.token]) do not form an intersecting sets (i.e. int is considered to belong only to the keyword token type and not both keyword and identifier token types). Taking that into account the following question arises.

C++11 quote:

2.2 Phases of translation [lex.phases]

7 White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. (2.7). The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.

So, the syntactic and semantic analysis of the C++ text is preformed AFTER the text is spilt into tokens.

Another C++11 quote:

2.7 Tokens [lex.token]

token:
identiﬁer
keyword
literal
operator
punctuator

Nowhere in the standard I have found the definition for the operator and punctuator grammar non-terminals. Anyway, according to the 2.12 Keywords and 2.13 Operators and punctuators the token new can be either a keyword or an operator token. How could C++ compiler can possibly determine the type of the new token BEFORE performing syntactic and semantic analysis of the code?

ecatmur · Accepted Answer

new and delete are the overloadable operators whose name is formed of a single token.

The differences between the productions preprocessing-op-or-punc ([lex.operators]/1) and operator ([over.oper]/1) are the removal of the punctuators and preprocessing operators { } [ ] # ## ( ) ; : ..., the digraph alternate tokens <: :> etc., the non-overloadable operators . .* :: ?, the lexical keyword alternate tokens and and_eq etc., and the addition of the multitoken operators new[], delete[], (), and []. new, delete, new[] and delete[] are included in operator so that their operator-function-ids (operator new etc.) can follow the rules of other overloadable operators without having to duplicate language, keep it updated, and invent a new production (dynamic-function-id?) to occur everywhere operator-function-id occurs. Note that non-overloadable operators whose name has the lexical form of an identifier (sizeof, typeid, etc.) are not included in operator, and thus nor in preprocessing-op-or-punc.

While this introduces an ambiguity between the identifier and preprocessing-op-or-punc productions, this does not affect phase 3 translation in any way. For phase 7, where the ambiguity is between keyword and operator, this is again not a problem, since the operator production and others that include the tokens new and delete e.g. new-expression ([expr.new]) do not reference the keyword or operator productions but instead contain the relevant tokens directly.

C++ token types

Answers (1)

Related Questions