Reputation: 2136
I am assuming that C++ token types (as per 2.7 Tokens [lex.token]
) do not form an intersecting sets (i.e. int
is considered to belong only to the keyword
token type and not both keyword
and identifier
token types). Taking that into account the following question arises.
C++11 quote:
2.2 Phases of translation [lex.phases]
7 White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. (2.7). The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.
So, the syntactic and semantic analysis of the C++ text is preformed AFTER the text is spilt into tokens.
Another C++11 quote:
2.7 Tokens [lex.token]
token:
identifier
keyword
literal
operator
punctuator
Nowhere in the standard I have found the definition for the operator
and punctuator
grammar non-terminals. Anyway, according to the 2.12 Keywords
and 2.13 Operators and punctuators
the token new
can be either a keyword
or an operator
token. How could C++ compiler can possibly determine the type of the new
token BEFORE performing syntactic and semantic analysis of the code?
Upvotes: 1
Views: 789
Reputation: 157484
new
and delete
are the overloadable operators whose name is formed of a single token.
The differences between the productions preprocessing-op-or-punc ([lex.operators]/1) and operator ([over.oper]/1) are the removal of the punctuators and preprocessing operators { } [ ] # ## ( ) ; : ...
, the digraph alternate tokens <: :>
etc., the non-overloadable operators . .* :: ?
, the lexical keyword alternate tokens and and_eq
etc., and the addition of the multitoken operators new[]
, delete[]
, ()
, and []
. new
, delete
, new[]
and delete[]
are included in operator so that their operator-function-ids (operator new
etc.) can follow the rules of other overloadable operators without having to duplicate language, keep it updated, and invent a new production (dynamic-function-id?) to occur everywhere operator-function-id occurs. Note that non-overloadable operators whose name has the lexical form of an identifier (sizeof
, typeid
, etc.) are not included in operator, and thus nor in preprocessing-op-or-punc.
While this introduces an ambiguity between the identifier and preprocessing-op-or-punc productions, this does not affect phase 3 translation in any way. For phase 7, where the ambiguity is between keyword and operator, this is again not a problem, since the operator production and others that include the tokens new
and delete
e.g. new-expression ([expr.new]) do not reference the keyword or operator productions but instead contain the relevant tokens directly.
Upvotes: 2