John
John

Reputation: 71

How does conversion of string literals to char arrays actually work in C++?

I am trying to understand how pointers,arrays and string literals work in C++.

Suppose we have the following line of code:

const char* const letters[] = {"A+","A"};

If I understand correctly, this declaration declares letters to be an array of constant pointers to constant characters. From my understanding, the compiler will actually convert each string literal to a null terminated char array and each element of letters is actually a constant pointer to the first element of that array.

So, for instance, letters[0] is actually a pointer to the "A" of "A+". However

std::cout<< letters[0];

actually outputs "A+" to the standard output. How can this be? Especially since letters[0] is a constant pointer?

My second question is related to the declaration above: if string literals are actually const char arrays, then why does the following line of code

const char* const letters[] = {{'A','+','\0'},{'A','\0'}};

throws

error: braces around scalar initializer for type ‘const char* const’ const char* const letters[] = {{'A','+','\0'},{'A','\0'}}; ^

Thank you!

Upvotes: 0

Views: 1563

Answers (2)

Peter
Peter

Reputation: 36597

The standard specifies that a string literal is represented - as far as your program is concerned - as an array of const characters of static storage duration with a trailing '\0' terminator. The standard doesn't specify HOW a compiler achieves this effect, only that your program can treat the string literal in that way.

So modifying a string literal is either prevented (e.g. passing a string literal to a function expecting a char * is a diagnosable error, and the code will not compile) or - if code works around the type system to modify any character in a string literal - involves undefined behaviour.

In your example, letters[0] is of type const char *, and has a value equal to the address of the first character in the string literal "A+".

std::cout, being of type std::ostream, has an operator<<() that accepts a const char *. This function is called by the statement std::cout << letters[0] and the function assumes the const char * points at a zero-terminated array of char. It iterates over that array, outputting each character individually, until it encounters the trailing '\0' (which is not output).

The thing is, a const char * means that the pointer is to a const char, not that the pointer cannot be changed (that would be char * const). So it is possible to increment the pointer, but not change the value it points at. So, if we do

 const char *p = letters[0];

 while (*p != '\0')
 {
     std::cout << *p;
     ++p;
 }

which loops over the characters of the string literal "A+", printing each one individually, and stopping when it reaches the '\0' (the above produces the same observable output std::cout << letters[0]).

However, in the above

*p = 'C';

will not compile, since the definition of p tells the compiler that *p cannot be changed. However, incrementing p is still allowed.

The reason that

const char* const letters [] = {{'A','+','\0'},{'A','\0'}};

does not compile is that an array initialiser cannot be used to initialise pointers. For example;

const int *nums =  {1,2,3};                          // invalid
const * const int nums2 [] = {{1,2,3}, {4,5,6}};     //  invalid

are both illegal. Instead, one is required to define arrays, not pointers.

const int nums[] = {1,2,3};
const int nums2[][3] = {{1,2,3}, {4,5,6}};

All versions of C and C++ forbid initialising pointers (or arrays of pointers in your example) in this way.

Technically, the ability to use string literals to initialise pointers is actually the anomaly, not the prohibition on initialising pointers using arrays. The reasons C introduced that exemption for string literals are historical (in very early days of C, well before K&R C, string literals could not be used to initialise pointers either).

Upvotes: 2

G. Sliepen
G. Sliepen

Reputation: 7973

As for your first question, the type of letters[0] is const char * const. This is a pointer to a character, but not a character itself. When passing a pointer to a character to std::cout, it will treat it as a NUL-terminated C string, and writes out all characters from the start of the memory pointed to until it encounters a NUL-byte. So that is why the output will be A+. You can pass the first character of the first string by itself by writing:

std::cout << letters[0][0];

The fact that the pointers and/or the C strings themselves are const doesn't matter here, since nothing is writing to them.

As for your second question, const char * const declares a single array, but you are providing a nested array on the right-hand side of that statement. If you really wanted two arrays of characters, write:

const char *const letters[] = {{'A', '+', '\0'}, {'A', '\0'}};

That is equal to your code form the first question. Or if you want a single array:

const char *const letters = {'A', '+', '\0', 'A', '\0'};

That line is equal to:

const char *const letters = "A+\0A";

Upvotes: 0

Related Questions