Width prefixes to string constants

Question

The latest version of the C standard provides for width prefixes to string constants e.g. u8"a" is a single preprocessing token.

Does whether you get one or two preprocessing tokens depend on the exact letters in the prefix? E.g. is it the case that u9"a" is still two preprocessing tokens?

paxdiablo · Accepted Answer

C11 specifies in 6.4 that a string literal is one of the pre-processing tokens:

preprocessing-token:
    header-name
    identifier
    pp-number
    character-constant
    string-literal
    punctuator
    each non-white-space character that cannot be one of the above

Hence u8"a" is a single token because the string literal section 6.4.5 lists that as a valid option:

string-literal:
    encoding-prefix(opt) " s-char-sequence(opt) "
encoding-prefix:
    u8
    u
    U
    L

The sequence u9"a" is not a string literal because u9 is not one of the valid prefixes.

The u9 would (from my reading) be treated as an identifier while the "a" would be a string literal, so that would be two separate pre-processing tokens.

Width prefixes to string constants

Answers (1)

Related Questions