merlin2011
merlin2011

Reputation: 75585

Is it undefined behavior to call printf with %s and pass a zero-length char*?

Is the third line in the following code well-defined?

char* result = new char[0];                                                                                                                                                                                                                    
printf("%d\n", strlen(result));                                                                                                                                                                                                                                                        
printf("%s\n", result);                                                                                                                                                                                                                                                                
delete[] result;

When I run the code, I get the expected output (a length of 0 followed by a two newlines printed). However, I'm not confident about whether this is a well-defined behavior or I just got lucky.

Is the call on the third-line well-defined?

Upvotes: 0

Views: 537

Answers (2)

Bitwize
Bitwize

Reputation: 11230

Short answer: It is Undefined Behavior

Long answer: In C++, allocating an array of size 0 will produce a valid pointer to an array with no elements. From the standard (taken from this answer):

From 5.3.4/7

When the value of the expression in a direct-new-declarator is zero, the allocation function is called to allocate an array with no elements.

From 3.7.3.1/2

The effect of dereferencing a pointer returned as a request for zero size is undefined.

(Emphasis mine)

This means that there is no way to properly read from (or write to) the pointer returned from a new T[0] request.

Both strlen and printf for string formatting "%s" are defined to work on strings of characters that are terminated by a special NUL character. They require reading the sequence of characters from the supplied pointer to try to find this NUL character in order to properly operate (which results in UB, since this requires dereferencing the pointer). These behaviors are defined in the C standard, since the C++ standard delegates definitions of most C library types/functions back to the C standard.

printf access for %s is defined to do the following:

From C11 Standard §7.21.6.1/6

If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type.

Characters from the array are written up to (but not including) the terminating null character. If the precision is specified, no more than that many bytes are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null character.

This requires access to the array (which will be UB, since the pointer is not valid to dereference)

Bonus

Your sample code is actually introducing UB on the second line due to the use of strlen, for similar reasons to above.

strlen is defined to do the following:

From C11 Standard §7.24.6.3/3: The strlen function

Returns

The strlen function returns the number of characters that precede the terminating null character.

Which is UB for the same reason as using printf.

Upvotes: 3

Martin Rosenau
Martin Rosenau

Reputation: 18523

Sorry for having answered your "original" question (before your edit):

How about C?

In C you don't have new.

However:

strlen counts the characters in an array until a NUL character is found.

printf(%s) will print the characters in an array up to the NUL character found.

If you have a native compiler and the array does not contain a NUL character the two commands will continue searching for a NUL character after the end of the array.

Example:

char a[6]="Hello ";
char b[100]="world!";
char c[100]="John!";
printf("%s\n",a);

If the compiler places the array b in memory directly after the array a this example will print "Hello world!".

However if the compiler decides to place c after a the program will print "Hello John!".

If you use a compiler that can detect accesses outside an array (e.g. a C++ compiler for .NET) you'll get an error when the end of the array is reached and there is no NUL character or the end of the array will even be treated the same way as a NUL character.

All in all you can say: Depending on the compiler you will have different behavior when you pass an array to printf(%s) when it does not contain a NUL character.

This is what I would call undefined behavior...

I don't know how the new char[0] in C++ behaves however I think there is no difference to C...

Upvotes: 0

Related Questions