Reputation: 343
I am a student learning C++, and I am trying to understand how null-terminated character arrays work. Suppose I define a char array like so:
char* str1 = "hello world";
As expected, strlen(str1)
is equal to 11, and it is null-terminated.
Where does C++ put the null terminator, if all 11 elements of the above char array are filled with the characters "hello world"? Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'
? CPlusPlus.com seems to suggest that one of the 11 would need to be '\0'
, unless it is indeed allocating 12.
Suppose I do the following:
// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );
// Copy the first one to the second one
strncpy( str2, str1, strlen(str1) );
// Output the second one
cout << "Str2: " << str2 << endl;
This outputs Str2: hello worldatcomY╗°g♠↕
, which I assume is C++ reading the memory at the location pointed to by the pointer char* str2
until it encounters what it interprets to be a null character.
However, if I then do this:
// Null-terminate the second one
str2[strlen(str1)] = '\0';
// Output the second one again
cout << "Terminated Str2: " << str2 << endl;
It outputs Terminated Str2: hello world
as expected.
But doesn't writing to str2[11]
imply that we are writing outside of the allocated memory space of str2
, since str2[11]
is the 12th byte, but we only allocated 11 bytes?
Running this code does not seem to cause any compiler warnings or run-time errors. Is this safe to do in practice? Would it be better to use malloc( strlen(str1) + 1 )
instead of malloc( strlen(str1) )
?
Upvotes: 24
Views: 60042
Reputation: 424
I think what you need to know is that char arrays starts from 0 and goes until array length-1 and on position array length has the terminator('\0').
In your case:
str1[0] == 'h';
str1[10] == 'd';
str1[11] == '\0';
This is why is correct str2[strlen(str1)] = '\0';
The problem with the output after the strncpy is because it copys 11 elements(0..10) so you need to put manually the terminator(str2[11] = '\0').
Upvotes: 0
Reputation: 26547
I think you are confused by the return value of strlen
. It returns the length of the string, and it should not be confused with the size of the array that holds the string. Consider this example :
char* str = "Hello\0 world";
I added a null character in the middle of the string, which is perfectly valid. Here the array will have a length of 13 (12 characters + the final null character), but strlen(str)
will return 5, because there are 5 characters before the first null character. strlen
just counts the characters until a null character is found.
So if I use your code :
char* str1 = "Hello\0 world";
char* str2 = (char*) malloc(strlen(str1)); // strlen(str1) will return 5
strncpy(str2, str1, strlen(str1));
cout << "Str2: " << str2 << endl;
The str2 array will have a length of 5, and won't be terminated by a null character (because strlen
doesn't count it). Is this what you expected?
Upvotes: 2
Reputation: 6716
For a standard C string the length of the array that is storing the string is always one character longer then the length of the string in characters. So your "hello world"
string has a string length of 11 but requires a backing array with 12 entries.
The reason for this is simply the way those string are read. The functions handling those strings basically read the characters of the string one by one until they find the termination character '\0'
and stop at this point. If this character is missing those functions just keep reading the memory until they either hit a protected memory area that causes the host operating system to kill your application or until they find the termination character.
Also if you initialize a character array with the length 11 and write the string "hello world"
into it will yield massive problems. Because the array is expected to hold at least 12 characters. That means the byte that follows the array in the memory is overwritten. Resulting in unpredictable side effects.
Also while you are working with C++, you might want to look into std:string
. This class is accessible if you are using C++ and provides better handling of strings. It might be worth looking into that.
Upvotes: 1
Reputation: 755457
In the case of a string literal the compiler is actually reserving an extra char
element for the \0
element.
// Create a new char array
char* str2 = (char*) malloc( strlen(str1) );
This is a common mistake new C programmers make. When allocating the storage for a char*
you need to allocate the number of characters + 1 more to store the \0
. Not allocating the extra storage here means this line is also illegal
// Null-terminate the second one
str2[strlen(str1)] = '\0';
Here you're actually writing past the end of the memory you allocated. When allocating X elements the last legal byte you can access is the memory address offset by X - 1
. Writing to the X
element causes undefined behavior. It will often work but is a ticking time bomb.
The proper way to write this is as follows
size_t size = strlen(str1) + sizeof(char);
char* str2 = (char*) malloc(size);
strncpy( str2, str1, size);
// Output the second one
cout << "Str2: " << str2 << endl;
In this example the str2[size - 1] = '\0'
isn't actually needed. The strncpy
function will fill all extra spaces with the null terminator. Here there are only size - 1
elements in str1
so the final element in the array is unneeded and will be filled with \0
Upvotes: 19
Reputation: 18522
The literal "hello world"
is a char
array that looks like:
{ 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' }
So, yes, the literal is 12 char
s in size.
Also, malloc( strlen(str1) )
is allocating memory for 1 less byte than is needed, since strlen
returns the length of the string, not including the NUL terminator. Writing to str[strlen(str1)]
is writing 1 byte past the amount of memory that you've allocated.
Your compiler won't tell you that, but if you run your program through valgrind or a similar program available on your system it'll tell you if you're accessing memory you shouldn't be.
Upvotes: 4
Reputation: 272752
Is it actually allocating an array of length 12 instead of 11, with the 12th character being '\0'?
Yes.
But doesn't writing to
str2[11]
imply that we are writing outside of the allocated memory space ofstr2
, sincestr2[11]
is the 12th byte, but we only allocated 11 bytes?
Yes.
Would it be better to use
malloc( strlen(str1) + 1 )
instead ofmalloc( strlen(str1) )
?
Yes, because the second form is not long enough to copy the string into.
Running this code does not seem to cause any compiler warnings or run-time errors.
Detecting this in all but the simplest cases is a very difficult problem. So the compiler authors simply don't bother.
This sort of complexity is exactly why you should be using std::string
rather than raw C-style strings if you are writing C++. It's as simple as this:
std::string str1 = "hello world";
std::string str2 = str1;
Upvotes: 6