George
George

Reputation: 338

Why can you initialize a string pointer as a string literal, but not as an array?

Strings can be initialized with a string literal

char word1[] = "abc";

or as a char array with a null terminator.

char word2[] = {'a', 'b', 'c', '\0'};

Instead of writing word1[], word1 can also be written with a pointer notation

char *word1 = "abc";

However, when trying to write word2 with a pointer notation

char *word2 = {'a', 'b', 'c', '\0'};

it shows me a bunch of warnings, such as

warning: excess elements in scalar initializer char *word2 = {'a', 'b', 'c', '\0'};

and when I run the program, I get Segmentation fault (core dumped).

Why is that? Why can you write char *word = "abc" but not char *word = {'a', 'b', 'c', '\0'} ?

Upvotes: 5

Views: 976

Answers (3)

Steve Summit
Steve Summit

Reputation: 47962

There's no fundamental reason for this -- it's just the way the language was originaly defined.

The basic syntax for array initialization is

type array[] = {value, value, value};

The basic syntax for pointer initialization is

type *pointer = value;

But then we have string literals. And it turns out that, deep down inside, the compiler does two almost completely different things with string literals.

If you say

char array[] = "string";

the compiler treats it just about exactly as if you had said

char array[] = { 's', 't', 'r', 'i', 'n', 'g', '\0' };

But if you say

char *p = "string";

the compiler does something quite different. It quietly creates an array for you, containing the string, more or less as if you had written

char __hidden_unnamed_array[] = "string";
char *p = __hidden_unnamed_array;

But the point -- the answer to your question -- is that the compiler does this special thing only for string literals. In the original definition of C, at least, there was no way to use the {value, value, value} syntax to create a hidden, unnamed array that you could do something else with. The {value, value, value} syntax was only defined as working as the direct initializer for an explicitly-declared array.

As @pmg mentions in a comment, however, newer versions of C have a new syntax, the compound literal, which does let you, basically, "use the {value, value, value} syntax to create a hidden, unnamed array to do something else with". So you can in fact write

char *word2 = (char[]){'a', 'b', 'c', '\0'};

and this works just fine. It works in other contexts, too: for example, you can say things like

printf("%s\n", (char[]){'d', 'e', 'f', '\0'});

Going back to a side question you asked: when you wrote

char *word2 = {'a', 'b', 'c', '\0'};

the compiler said to itself, "Wait a minute, word2 is one thing, but the initializer has four things. So I'll throw away three, and warn the programmer that I'm doing so." It then did the equivalent of

char *word2 = {'a'};

and if you later tried something like

printf("%s", word2);

you got a crash when printf tried to access address 0x00000061.

Upvotes: 5

Eric Postpischil
Eric Postpischil

Reputation: 222933

Why can you initialize a string pointer as a string literal, but not as an array?

Because {'a', 'b', 'c', '\0'} is not an array; it is a list of values to put in the thing being initialized.

The syntax {'a', 'b', 'c', '\0'} does not stand for an array in C. People see it being used to initialize arrays, but, when used in that way, it is just a list of values. It could also be used to initialize a structure, because it is just listing values to put into the thing being initialized. It is not, by itself, an array.

In char *word2 = {'a', 'b', 'c', '\0'};, it does not make sense to initialize word2 with the values 'a', 'b', 'c', and '\0'. It is just one pointer and should be initialized with one value. Giving a list of four values to initialize one thing does not make sense.

In char *word2 = "abc";, "abc" is not a list of values. It is a string literal. A string literal defines a static array that is filled with the characters of the string. And then the string literal is automatically converted to a pointer to its first element, and it is this pointer that is used to initialize word2.

So char *word2 = "abc"; does two things: The string literal defines an array, and the initialization sets word2 to point to the first element of that array. In contrast, in char *word2 = {'a', 'b', 'c', '\0'};, there is nothing to define an array; the list of values is just a list of values.

Comparing this to array initializations, in char word2[] = {'a', 'b', 'c', '\0'};, the array is initialized with a list of values, which is fine. However, in char word1[] = "abc";, something special happens. C 2018 6.7.9 14 says we can initialize an array of character type with a string literal, and the characters of the string will be used to initialize the elements of the array.

Upvotes: 9

dbush
dbush

Reputation: 224102

In general, the type of the initializer must match the type of what is being initialized.

This works:

char *word1 = "abc";

Because a string constant has type array of char and such an array decays to type char * when used in an expression or initialization, so this matches the declared type.

This works:

char word2[] = {'a', 'b', 'c', '\0'};

Because an array of char is being initialized with an initializer list of characters (technically they have type int but are converted to char).

This gives a warning:

char *word2 = {'a', 'b', 'c', '\0'};

Because an initializer list is being used to initialize a type which is not an array or struct.

And this is OK:

char word1[] = "abc";

Because the C standard specifically allows initializing a char array with a string literal, as specified in section 6.7.9p14:

An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

Upvotes: 3

Related Questions