cFsichb
cFsichb

Reputation: 409

What is the correct output of sizeof("string")?

On a microcontroller, in order to avoid loading settings from a previous firmware build, I also store the compilation time, which is checked at loading.

The microcontroller project is build with 'mikroC PRO for ARM' from MikroElektronika.

Being easier to debug, I programmed the code with minGW on my PC and, after checking it left and right put, it into microC.

The code using that check failed to work properly. After an evening of frustrating debugging I, found sizeof("...") yielding different values on the two platforms and causing a buffer overflow as a consequence.

But now I don't know whose fault is it.

To re-create the problem, use following code:

#define SAVEFILECHECK_COMPILE_DATE __DATE__ " " __TIME__

char strA[sizeof(SAVEFILECHECK_COMPILE_DATE)];
char strB[] = SAVEFILECHECK_COMPILE_DATE;

printf("sizeof(#def): %d\n", (int)sizeof(SAVEFILECHECK_COMPILE_DATE));
printf("sizeof(strA): %d\n", (int)sizeof(strA));
printf("sizeof(strB): %d\n", (int)sizeof(strB));

On MinGW it returns (as expected):

sizeof(#def): 21
sizeof(strA): 21
sizeof(strB): 21

However, on 'mikroC PRO for ARM' it returns:

sizeof(#def): 20
sizeof(strA): 20
sizeof(strB): 21

This difference caused a buffer overflow down the line (overwriting byte zero of a pointer – ouch).

21 is the answer I expect: 20 chars and the '\0' terminator.

Is this one of the 'it depends' things in C or is there a violation of the sizeof operator behavior?

Upvotes: 39

Views: 5299

Answers (5)

Lundin
Lundin

Reputation: 213960

This is all 100% standardized. C17 6.10.8.1:

__DATE__ The date of translation of the preprocessing translation unit: a character string literal of the form "Mmm dd yyyy" ... and the first character of dd is a space character if the value is less than 10.
...
__TIME__ The time of translation of the preprocessing translation unit: a character string literal of the form "hh:mm:ss"

  • "Mmm dd yyyy" = 11
  • "hh:mm:ss" = 8
  • " " (the space you used for string literal concatenation) = 1
  • Null termination = 1

11 + 8 + 1 + 1 = 21

As for sizeof, a string literal is an array. Whenever you pass a declared array to sizeof, the array does not "decay" into a pointer to the first element, so sizeof will report the size of the array in bytes. In case of string literals, this includes the null termination, C17 6.4.5:

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.

(Translation phase 6 is also mentioned, which is the string literal concatenation phase. I.e string literal concatenation is guaranteed to happen before null termination is added.)

So it would appear that mikroC PRO is non-conforming/bugged. There's lots of questionable embedded systems compilers out there for sure.

Upvotes: 43

supercat
supercat

Reputation: 81179

As others have noted, the behavior of sizeof on a string literal has long been standardized as yielding a value one larger than the length of the string represented thereby, rather than the size of the smallest character array that could be initialized using that string literal. That having been said, if one wishes to make code compatible even with compilers that adopt the latter interpretation, I'd suggest using something an expression like (1-(sizeof "")+(sizeof "stringLiteral of interst")) which would allow code to operate correctly with the quirky compilers, but avoid sacrificing compatibility with standard ones.

Upvotes: 11

dbush
dbush

Reputation: 224082

This is a compiler bug. String literals, whether they consist of a single quoted sequence or multiple adjacent quoted sequences, are stored as static arrays which always contain a terminating null byte. That's not happening here, where it should.

This is specified in section 6.4.5p6 of the C standard regarding string literals:

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. 78) The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence.

This means that sizeof(SAVEFILECHECK_COMPILE_DATE) should count both the characters in the string and the terminating null byte, but the compiler for some reason isn't including the null byte.

Upvotes: 5

chqrlie
chqrlie

Reputation: 144780

Is this one of the 'it depends' things in C or is there a violation of the sizeof operator behavior?

The behavior is fully defined in the C Standard. Below are the relevant quotes from the C99 published standard, which were identical except for the section numbers in the C90 (ANSI C) version and have not been modified in essence in more recent version up to and including the upcoming C23 version:

The __DATE__ and __TIME__ macros are specified by

6.10.8 Mandatory macros

__DATE__ The date of translation of the preprocessing translation unit: a character string literal of the form "Mmm dd yyyy", where the names of the months are the same as those generated by the asctime function, and the first character of dd is a space character if the value is less than 10. If the date of translation is not available, an implementation-defined valid date shall be supplied.
__TIME__ The time of translation of the preprocessing translation unit: a character string literal of the form "hh:mm:ss" as in the time generated by the asctime function. If the time of translation is not available, an implementation-defined valid time shall be supplied.

From the above, if the time of translation is available, the macro SAVEFILECHECK_COMPILE_DATE expands to 3 string literals for a total of 11+1+8 = 20 characters, hence 21 bytes including the null terminator. If the time of translation is not available, implementation defined valid dates and times must be used, hence the behavior must be the same.

5.1.1.2 Translation phases

  1. Adjacent string literal tokens are concatenated.
  2. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.

Hence the fact that the argument to sizeof be made of 3 adjacent string literals is irrelevant, all occurrences of the sizeof operator in your examples get a single string literal argument in phase 7, then

6.5.3.4 The sizeof operator

4  When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1. When applied to an operand that has array type, the result is the total number of bytes in the array.

Therefore all 3 outputs in your example must show 21 bytes. You have found a bug in the mikroc compiler: you should report it and find a work around for your current projects.

Upvotes: 16

Nick
Nick

Reputation: 10539

#include <stdio.h>

int main(){
    printf("%zu\n", sizeof("aa"));
}

Interestingly, in this case, "aa" not decaying to pointer, but act as char array. Since array have 3 elements (including zero terminator), output is 3.

This defines string (array of char)

#include <stdio.h>

#define SAVEFILECHECK_COMPILE_DATE __DATE__ " " __TIME__

int main(){
    printf("%zu\n", sizeof(SAVEFILECHECK_COMPILE_DATE));
}

every time you compile it is different, because __DATE__ and __TIME__.

My current result is 21, but it may change.

Same is valid for C++.

Upvotes: -4

Related Questions