Is a 64-bit character literal possible in C?

Question

The following code compiles fine:

uint32_t myfunc32() {
    uint32_t var = 'asdf';
    return var;
}

The following code gives the warning, "character constant too long for its type":

uint64_t myfunc64() {
    uint64_t var = 'asdfasdf';
    return var;
}

Indeed, the 64-bit character literal gets truncated to a 32-bit constant by GCC. Are 64-bit character literals not a feature of C? I can't find any good info on this.

Edit: I am doing some more testing. It turns out that another compiler, MetroWerks CodeWarrior, can compile the 64-bit character literals as expected. If this is not already a feature of GCC, it really ought to be.

Marco Bonelli · Accepted Answer

Are 64-bit character literals not a feature of C?

Indeed they are not. As per C99 §6.4.4.4 point 10 (page 73 here):

An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.

So, character constants have type int, which on most modern platforms means int32_t. On the other hand, the actual value of the int resulting from a multi-byte character constant is implementation defined, so you can't really expect much from int x = 'abc';, unless you are targeting a specific compiler and compiler version. You should avoid using such statements in sane C code.

As per GCC-specific behavior, from the GCC documentation we have:

The numeric value of character constants in preprocessor expressions. The preprocessor and compiler interpret character constants in the same way; i.e. escape sequences such as ‘\a’ are given the values they would have on the target machine.

The compiler evaluates a multi-character character constant a character at a time, shifting the previous value left by the number of bits per target character, and then or-ing in the bit-pattern of the new character truncated to the width of a target character. The final bit-pattern is given type int, and is therefore signed, regardless of whether single characters are signed or not. If there are more characters in the constant than would fit in the target int the compiler issues a warning, and the excess leading characters are ignored.

For example, 'ab' for a target with an 8-bit char would be interpreted as ‘(int) ((unsigned char) 'a' * 256 + (unsigned char) 'b')’, and '\234a' as ‘(int) ((unsigned char) '\234' * 256 + (unsigned char) 'a')’.

Is a 64-bit character literal possible in C?

Answers (1)

Related Questions