Sandeep
Sandeep

Reputation: 98

Multidimensional array and pointer math

Given this code:

#include <stdio.h>
int main()
{
    char a[3][5] = {2, 7, 3, 9, 5,
                    1, 2, 3, 4, 5,
                    6, 7, 8, 9, 10};
    printf("sizeof(a)=%d, sizeof(&a)=%d, sizeof(a[0])=%d\n", sizeof(a),
           sizeof(&a), sizeof(a[0]));
    printf("a+1=%p, a[0]+1=%p, &a+1=%p, &a[0]+1=%p\n", a+1, a[0] + 1, &a+1,
           &a[0]+1);
}

What would be the output if a starts at 0x1000, and a char is 1 byte? The real question is why those numbers get printed, esp. the last line.

I have executed the program and have observed the output.

sizeof(a)=15, sizeof(&a)=8, sizeof(a[0])=5 a+1=0x7ffc7d4dfc15, a[0]+1=0x7ffc7d4dfc11, &a+1=0x7ffc7d4dfc1f, &a[0]+1=0x7ffc7d4dfc15

The question here is why the different numbers. For example: Why is there a difference between a[0]+1 and &a+1?

Upvotes: 1

Views: 129

Answers (3)

Sandeep
Sandeep

Reputation: 98

OK, here's my version of the explanation. The key to understanding addresses in multiple arrays are a few rules:

Rule 1:

If p is a pointer to some element of an array, then p++ increments p to point to the next element..(K&R sec 5.4) and if the type of an expression or subexpression is "array of T", for some T, then the value of the expression is a pointer to the first object in the array, and the type of the expression is altered to pointer to T. (K&R, sec A7.1)

Together, these imply that if there's an array:

T array[arraysize];

for some type T, then array becomes (with some exceptions)

T *ptr;

And (array + 1) = (ptr + 1) = ptr_value + 1 x sizeof(T). If that didn't make sense, suppose there's a

char str[] = "hello!";

Then str + 1 = str_addr + 1 x sizeof(char) = str_addr + 1which points to the letter 'e'.

Rule 2:

In C, a two dimensional array is really a one-dimensional array, each of whose elements is an array (K&R, sec 5.7)

Rule 3:

When sizeof is applied to an array, the result is the total number of bytes in the array (K&R, sec A7.4.8). Note, this is one of the exceptions mentioned in Rule 1 above.

Rule 4:

For any array of type:

T array[arraysize];

&array is of type T (*) [size]. Rule 1 tells us that &array + 1 will be (array_addr + arraysize) . (Not explained in very well in K&R, which I think is the root cause of confusion in understanding these issues).

Now let's look at the original problem. Say the array a[] begins at memory location A (0x7ffc7d4dfc10 in this example).

Rule 2 tells us that a[] really is a single array of 3 elements and each of the elements is an array of size 5. Pretend there's a:

typedef char char5_t[5];

Then a[] can be represented as:

char5_t a[3];

So using Rule 3, sizeof(a) = 3 x sizeof(char5_t) = 3 x 5 = 15.

sizeof(&a) is the size of a pointer = 8 (64 bits) in this example because I was running it on a 64 bit machine.

sizeof(a[0]) is just the size of char5_t = 5. You can also apply Rule 3 to get sizeof(char[5]) = 5.

To compute (a + 1), remember a[] can be rewritten as char5_t a[3]. Rule 1 tells us that a + 1 = A + 1 x sizeof(char5_t) = A + 5 = 0x7ffc7d4dfc15.

To compute a[0] + 1, imagine there's a char5_t a0 = a[0]; Since a0's actual type is char[5], Rule 1 tells us that (a0 + 1) = A + 1 * sizeof(char) = 0x7ffc7d4dfc11.

&a + 1 is provided by Rule 4 and is A + 15 = 0x7ffc7d4dfc10 + 15 = 0x7ffc7d4dfc1f.

&a[0] is equivalent to pointer to char5_t. Using Rule 1, &a[0] + 1 = A + 1 * sizeof(char5_t) =

A + 5 = 0x7ffc7d4dfc15.

Upvotes: 0

user539810
user539810

Reputation:

Following is a brief tutorial on pointer arithmetic since it's essential to understanding things. It's recommended to experiment with things.

  1. P+n, where P is a pointer and n is some integer value, results in n*sizeof(*P) bytes more than P. The type of the result is the same as the type of P. This is the most basic case of pointer arithmetic.
  2. A+n, where A is an array and n is some integer value, results in n+sizeof(A[0]) bytes more than A. The type of the result is a pointer to the type of A[0]. Alternatively, you may say A+n is the same as &A[0]+n. This is known as "array decay" since the expression A "decays" to a pointer to its first element (decays to &A[0]). For example, if A had type int[20][40], it would decay to the type int(*)[40].

Now we can continue.

  • sizeof(a)=15
    sizeof(a) results in 15 because there are 3*5=15 items of type char, and sizeof(char) is 1. Simply put, sizeof(array) == count*sizeof(type). In the case of a multidimensional array, you simply multiply the counts together and create one large count, resulting in sizeof(a) == (3*5)*sizeof(char), or sizeof(a) == (15)*1.

  • sizeof(&a)=8
    &a is of type char (*)[3][5], or "pointer to 3 arrays of 5 chars". Because it's a pointer, it will be whatever the pointer size is on your machine. In this case, it's 8 (which means you're likely running this on a 64-bit platform).

  • sizeof(a[0])=5
    a is 3 arrays of 5 chars, so a[0] gets the first array of 5 chars. As mentioned previously, sizeof(array) == count*sizeof(type), so sizeof(a[0]) == 5*sizeof(char), or 5.

  • a+1=0x7ffc7d4dfc15
    Remember that pointer arithmetic forces an array to decay to a pointer to its first element. So rewrite a as &a[0]. Now add 1 to that pointer, which will add 1*sizeof(a[0]), or 5, to the address. If you subtract 5 from the result printed, the address of a itself is revealed: 0x7ffc7d4dfc10. This will be important for the remainder of this answer.

  • a[0]+1=0x7ffc7d4dfc11
    As previously mentioned, a starts at 0x7ffc7d4dfc10, and a[0] gets the first array of 5 chars. Since a[0] is of type char[5] and pointer arithmetic is applied, array decay occurs. Basically, you now have &a[0][0] (type: char *) instead of a[0]. This makes things simple: 0x7ffc7d4dfc10 + 1*sizeof(a[0][0]), or 0x7ffc7d4dfc10 + 1*1 = 0x7ffc7d4dfc11.

  • &a+1=0x7ffc7d4dfc1f
    Again, & retrieves a pointer to a. a has type char[3][5] (3 arrays of 5 chars), so &a has type char(*)[3][5] (pointer to 3 arrays of 5 chars). Adding 1 to this results in 0x7ffc7d4dfc10 + 1*sizeof(char[3][5]) = 0x7ffc7d4dfc10 + (3*5)*sizeof(char) = 0x7ffc7d4dfc10 + 15. 15 in hexadecimal is 0x0f, so 0x7ffc7d4dfc10 + 0x0f = 0x7ffc7d4dfc1f.

  • &a[0]+1=0x7ffc7d4dfc15
    a[0] has type char[5]. &a[0] has type char(*)[5]. Adding 1 to this pointer results in adding 1*sizeof(a[0]), which is 5, resulting in 0x7ffc7d4dfc10 + 5 = 0x7ffc7d4dfc15.

Upvotes: 1

Peter - Reinstate Monica
Peter - Reinstate Monica

Reputation: 16016

Well, some problems arise from the wrong format specifiers. As Vlad pointed out it is important to use %zu for the result of sizeof because on some systems (64 bit?) it may not be int. These things are important if you use several arguments to printf because the stack layout will not be what printf expects. Single small values will probably print alright on a little endian system even with the wrong integer length conversion specifier, because the first parameter's position is known and the bytes of lower significance come first.

But one question from your comment can be answered easily, in addition:

Why is there a difference between a[0]+1 and &a+1?

The reason is that a[0] is an array of 5 ints (the matrix is a sequence of 3 one-dimensional arrays of 5 ints each). It decays to a pointer to int. Adding one to that pointer advances it to the next int, that is probably by adding 4 to its numerical value.

By contrast, &a is not an array or a matrix but an address already -- the address of the whole matrix. The numerical value is the same as &(a[0]), or &(a[0][0]); but its type is different. It's indeed a pointer to the whole matrix as a single object. (This is rarely done but perfectly well defined and legal.)

The matrix' size is 15 ints, that is perhaps 60 bytes. Adding one to the pointer advances it to "the next matrix", probably by adding numerically 60 to the address.

Upvotes: 1

Related Questions