Li Chen
Li Chen

Reputation: 5270

How is char* textMessages[] formatted in memory?

As we know, multi-array like int array1[3][2] = {{0, 1}, {2, 3}, {4, 5}}; is contiguous, so it is exactly the same as int array2[6] = { 0, 1, 2, 3, 4, 5 };

for (int *p = *array1, i = 0; i < 6; i++, p++)
{
    std::cout << *p << std::endl;
}

0

1

2

3

4

5

Then, I have these codes:

char *textMessages[] = {
    "Small text message",
    "Slightly larger text message",
    "A really large text message that ",
    "is spread over multiple lines*"
};

I find that its layout is not the same as int[3][2]:


char *textMessages[] = {
    "Small text message",
    "Slightly larger text message",
    "A really large text message that ",
    "is spread over multiple lines*"
};
char *a = *(textMessages)+17, *b = *(textMessages + 1), *c = *(textMessages + 1) + 27, *d = *(textMessages + 2), *e = *(textMessages + 2) + 31, *f = *(textMessages + 3);
std::ptrdiff_t a_to_b = b - a, c_to_d = d - c, e_to_f = f - e;
printf("they(a b c d e f) are all messages's first or final element: %c %c %c %c %c %c\n", *a, *b, *c, *d, *e, *f);
printf("\n\naddress of element above: \n%p\n%p\n%p\n%p\n%p\n%p\n", a, b, c, d, e, f);
printf("\n\nptrdiff of a to b, c to d and e to f: %d %d %d\n", a_to_b, c_to_d, e_to_f);

they(a b c d e f) are all messages' first or final element: e S e A t i


address of element above:
002F8B41
002F8B44
002F8B5F
002F8B64
002F8B83
002F8B88


ptrdiff of a to b, c to d and e to f: 3 5 5

My question is:

  1. What does 3 5 5 mean here?
  2. Why 3 5 5, not 5 5 5
  3. What's the layout here?

Edit: I don't think this question duplicates of How are multi-dimensional arrays formatted in memory?, because what I ask is not the same as that question's doubt and the solution should not that's question's answers.

Upvotes: 0

Views: 108

Answers (4)

eerorika
eerorika

Reputation: 238401

How is char* textMessages[] formatted in memory?

Just like other single dimensional arrays. Each element is stored in a consecutive memory location. Those elements are pointers to char object.

Each of those pointers point to the beginning of a string literal. String literals have static storage duration, and their memory location is implementation defined.

What does 3 5 5 mean here?

You've done pointer subtraction between pointers that do not point to the same array (each string literal is a separate array). The behaviour of the program is technically undefined because of this.

In practice, most of the time, what you get is the distance of the pointed values in memory. Since the location of those arrays is implementation defined, there isn't anything meaningful about those values.

Why 3 5 5, not 5 5 5

  • Because the behaviour is undefined
  • Because that happens to be the distance between the pointer character objects. The distance will depend on where the compiler chooses to store the string literals.

You can pick either explanation depending on your point of view.


PS. You are converting the string literals to a pointer to non-const char. This conversion has been deprecated ever since C++ was standardized and has been ill-formed since C++11.

PPS. Accessing int *p = *array1 beyond the bounds of array1[0] which has the size 2, as in your first code snippet, technically has undefined behaviour. Same applies to *(textMessages + 2) + 31 in the second.

Upvotes: 2

walter
walter

Reputation: 1239

3 5 5 means nothing here, as well as 5 5 5 .

char *textMessages[] is an char* array, the elements of it are pointers. And they (the pointers) are contiguous in the array. But the value of these pointers are not that related. The strings in your code may existed in different places.

The result on my compiler is: 243 309 1861

Upvotes: 1

Serge Ballesta
Serge Ballesta

Reputation: 149075

On a language lawyer point of view, this:

int array1[3][2] = {{0, 1}, {2, 3}, {4, 5}};
for (int *p = *array1, i = 0; i < 6; i++, p++)
{
    std::cout << *p << std::endl;
}

is undefined per standard, because array1 is an array of 3 arrays of size 2. So 0 and 1 are is same array, but not 1 and 2 so incrementing the pointer is correct at first time but the second incrementation makes it point past the first array (which is correct) so dereferencing it is formally UB.

Of course, any current and past implementation do accept it.


But this is a quite different animal:

char *textMessages[] = {
    "Small text message",
    "Slightly larger text message",
    "A really large text message that ",
    "is spread over multiple lines*"
};

Here textMessages is an array of pointers and not a 2D array. But it is even worse. It is an array of 4 char * pointers pointing to string litterals, and it is undefined behaviour to modify a string litteral. That means that textMessages[0][0] = 'X'; is likely to crash the program.

But once we know we have an array of pointers to string litterals, all becomes clear: the compiler has stored the string litterals in memory the way it wanted, and has just given pointers to that memory. So the 3,5,5 are just padding values because your compiler has decided to store text litteral that way.

Upvotes: 1

IlBeldus
IlBeldus

Reputation: 1040

String literals have Static storage duration meaning that they are allocated in memory when the program starts but they are not assured to be in contiguous memory from one another as, at this point, the program might not even know they are in an array. When the array is then constructed the addresses of those strings are placed in contiguous memory (but, of course, not the string themselves)

P.S. What I refer as "strings" in the above really means "string literal"

Upvotes: 1

Related Questions