Chris Gray
Chris Gray

Reputation: 11

Why don't char arrays with separate chars end with a null-terminator unlike string literals?

I was playing around with char arrays in c++ and wrote this program:

int main()
{

char text[] = { 'h', 'e', 'l', 'l', 'o' };  //arrays initialised like this 
                                            //will have a size of the number 
                                            //of elements that you see

char text2[] = "hello"; //arrays initialised like this will have a size of 
                        //the number of elements that you see + 1 (0 on the 
                        //end to show where the end is

cout << endl;

cout << "The size of the first array is: " << sizeof(text) << endl;

cout << endl;

for (int i = 0; i < sizeof(text); i++)
{
    cout << i << ":" << text[i] << endl;
}
cout << endl;

cout << "The size of the first array is: " << sizeof(text2) << endl;

cout << endl;

for (int i = 0; i < sizeof(text2); i++)
{
    cout << i << ":" << text2[i] << endl;
}
cout << endl;

cin.get();

return 0;
}

This program gives me the output:

The size of the first array is: 5

0:h
1:e
2:l
3:l
4:o

The size of the first array is: 6

0:h
1:e
2:l
3:l
4:o
5:

My question is: Is there a particular reason that initializing a char array with separate chars will not have a null terminator (0) on the end unlike initializing a char array with a string literal?

Upvotes: 0

Views: 350

Answers (6)

Drew Dormann
Drew Dormann

Reputation: 63946

Is there a particular reason that initializing a char array with separate chars will not have a null terminator (0)

The reason is because that syntax...

Type name[] = { comma separated list };

...is used for initializing arrays of any type. Not just char.

The "quoted string" syntax is shorthand for a very specific type of array that assumes a null terminator is desired.

Upvotes: 1

Maxim Egorushkin
Maxim Egorushkin

Reputation: 136515

You can terminate it yourself in multiple ways:

char text1[6] = { 'h', 'e', 'l', 'l', 'o' };
char text2[sizeof "hello"] = { 'h', 'e', 'l', 'l', 'o' };
char text3[] = "hello"; // <--- my personal favourite

Upvotes: 1

Vlad from Moscow
Vlad from Moscow

Reputation: 311126

A string literal like for example this "hello" has a type of a constant character array and initializwd the following way

const char string_literal_hello[] = { 'h', 'e', 'l', 'l', 'o', '\0' };

As it is seen the type of the string literal is const char[6]. It contains six characters.

Thus this declaration

char text2[] = "hello"; 

that can be also written like

char text2[] = { "hello" }; 

in fact is substituted for the following declaration

char text2[] = { 'h', 'e', 'l', 'l', 'o', '\0' };

That is then a string literal is used as an initializer of a character array all its characters are used to initialize the array.

Upvotes: 1

Benjamin Lindley
Benjamin Lindley

Reputation: 103751

When you designate a double quote delimited set of adjacent characters (a string literal), it is assumed that what you want is a string. And a string in C means an array of characters that is null-terminated, because that's what the functions that operate on strings (printf, strcpy, etc...) expect. So the compiler automatically adds that null terminator for you.

When you provide a brace delimited, comma separated list of single quote delimited characters, it is assumed that you don't want a string, but you want an array of the exact characters you specified. So no null terminator is added.

C++ inherits this behavior.

Upvotes: 0

Cheers and hth. - Alf
Cheers and hth. - Alf

Reputation: 145429

A curly braces initializer just provides the specified values for an array (or if the array is larger, the rest of the items are defaulted). It's not a string even if the items are char values. char is just the smallest integer type.

A string literal denotes a zero-terminated sequence of values.

That's all.

Upvotes: 4

Bathsheba
Bathsheba

Reputation: 234875

Informally, it's the second quotation character in a string literal of the form "foo" that adds the NUL-terminator.

In C++, "foo" is a const char[4] type, which decays to a const char* in certain situations.

It's just how the language works, that's all. And it's very useful since it dovetales nicely with all the standard library functions that model a string as a pointer to the first element in a NUL-terminated array of chars.

Splicing in an extra element with something like char text[] = { 'h', 'e', 'l', 'l', 'o' }; would be really annoying and it could introduce inconsistency into the language. Would you do the same thing for signed char, and unsigned char, for example? And what about int8_t?

Upvotes: 1

Related Questions