S.C.
S.C.

Reputation: 231

What is the dereference operator telling the compiler to do during multidimensional array references?

I've been playing around with multidimensional arrays and bracket notation, trying my best to come up with a conceptual understanding of the interplay between dereferencing, pointer type, and pointer arithmetic.

For this problem, consider the following arbitrary 3D array reference:

    arr[i][j][k]

Now, I believe the following conversion is equivalent (I've run a few code examples that I believe confirms the accuracy):

    *(*(*(arr+i)+j)+k)

The code I ran looks like this:

#include <stdio.h>

int main(void) {

    char arr[2][3][4] = { 'a', 'b' ,'c' ,'d' ,'e' ,'f' ,'g' ,'h' ,'i' ,'j' ,'k' ,'l' ,'m' ,'n' ,'o' ,'p' ,'q' ,'r' ,'s' ,'t' ,'u' ,'v' ,'w' ,'x' };

    printf("%c \n", arr[1][1][1]);

    printf("%c \n", *(*(*(arr+1)+1)+1));

    return 0;
}

You can change up the values in the printf statements (so long as the values for i,j, and k are the same between the two). At any rate, they always agree on what letter is printed out to the command terminal.

The way I acquired this presumed equivalent answer is depicted in the below picture (sorry, I used the math stack exchange writing software as it made everything a little bit easier... if the notation for the pointer arithmetic is a little confusing, I apologize.)


Derivation


So my main confusion arises from the dereference operator *. Quite frankly, I don't understand what it is telling the compiler to do.

For a simple example, let's assume that the character array arr starts at a base address of 100.

arr is effectively a pointer that points to a 2D matrix of type char[3][4]. I believe that means (because a character requires 1 byte) the code arr + 1 is actually saying 100+12. Rewriting my equation, I would then have:

*(*(*(112)+1)+1). I am fairly certain that the dereference operator (to the left of 112) takes left-to-right precedence over the + symbol (to the right of 112).

From my experience with the dereference symbol in 1D cases, this would produce the value stored at the 112 memory cell. But this just seems wrong in the current situation. Clearly, I am not understanding how the dereference operation behaves in this context. Any help is greatly appreciated!

Cheers~

Upvotes: 1

Views: 247

Answers (3)

chqrlie
chqrlie

Reputation: 145317

Subscripting arrays and pointers has the same meaning: a[i] tells the compiler to compute the address of the i-th element of the array. This address will be read if you use a[i] in an rvalue.

If a is a pointer, this computation requires reading the contents of the pointer and adding i times the size of the array element to it.

If a is an array, this computation just involves adding i times the size of the array element to the address of the array.

In both cases, a[i] can be rewritten as *(a+i) because, if a is an array, it automatically decays as a pointer to its first element in the expression a+i. This is conceptually equivalent to:

    int a[10];
    int i, x;

    x = a[i];

    int *a_ = &a[0];
    x = *(a_ + i);

Note that the * in *(a+i) causes the address computed by a+i to be read if the value is used in an rvalue context, such as x = *(a+i); or written to if it appears in an lvalue context: *(a+i) = x;. The statement *(a+i); neither reads nor writes the array element (unless a is declared volatile).

With your example, char arr[2][3][4], arr[0][0][0] has the same address as arr[0][0], arr[0] and arr, they just have different types.

Upvotes: 1

No, the pointers in C are not integers, they have distinct types, and those used with the array indexing operator or pointer arithmetic must refer to objects of complete type, i.e. whose size is known where they are used.

The declaration

char arr[2][3][4];

could be read arr[i][j][k] is of type char.

It also tells that arr itself is of type char[2][3][4]. Each first-level element, i.e. arr[i] is of type char[3][4] and each 2nd level element is of type char[4]. If you do sizeof on these types,

  • sizeof(char) is 1
  • sizeof(char[4]) is 4
  • sizeof(char[3][4]) is 12
  • sizeof(char[2][3][4]) is 24

Now, in almost any context, an expression of array type is said to decay to a pointer to the first element. That applies to arr too. Since the first element of arr is arr[0], and from above the type or arr[0] is that of arr[i] i.e. char[3][4], the resulting type is a pointer to a char[3][4]; which in C is written as (char *)[3][4].

If you add an integer to that pointer, the resulting pointer will point to the nth (0-based) object of that type.

I.e. arr + 0 would result in a pointer that points to the first char[3][4] subarray of the array; arr + 1 to the second and so forth. Also, arr + n would point at the memory location (byte) which is n * sizeof(char[3][4]) bytes (also n * sizeof(arr[0])) from the beginning of arr.

Now, if we dereference that pointer, e.g. *(arr + 1), the resulting expression will be an array of type char[3][4]. But that value would immediately decay to a pointer to the first element of that array; the elements of that array are of type char [4], and the pointer type would be char (*)[4].

If we add an integer, as above, we get a pointer to the nth char[4] subarray of that array; i.e. with (*(arr + 1) + 1). Now if we dereference that pointer *(*(arr + 1) + 1), we have an expression of type char[4], but it immediately decays to a pointer to the first element. Since the elements of char[4] are characters, the type would be plain and simple Garak... err char *.

Now if we add an integer to that, we get a pointer to the nth element in the char[4] array (e.g. (*(*(arr + 1) + 1) + 1) , and if we dereference this (*(*(*(arr + 1) + 1) + 1)), we get an lvalue of type char which designates this certain character within the array.


Why is dereferencing operator needed then? Of course to refere to the subobjects.

If you remove the stars, (((arr + 1) + 1) + 1) would be actually equal to arr + 3, which would be the pointer to arr[3], i.e. the 4th char[3][4] subarray of arr. Since arr only had 2 such subarrays, not only would it fail to do what you wanted, but that would result in the dreaded undefined behaviour.


TL;DR:

Given the above definition, arr + 1 and *(arr + 1), after decay, will result in pointers pointing to the same memory location, but having distinct types.

Upvotes: 4

HolyBlackCat
HolyBlackCat

Reputation: 96959

There are two things you need to understand:

  • Arrays are not pointers, even though they are automatically converted to pointers in most situations.

  • Multi-dimensional arrays get zero special treatment compared to one-dimensional arrays. They are one-dimensional arrays, where each element is also an array.


When you do *(*(*(arr+1)+1)+1), arr is automatically converted to the pointer to its first element. This pointer has type char(*)[3][4] (pointer to char[3][4]). Adding 1 to it increases the value of the pointer by 1*sizeof(char[3][4]) == 12. You seem to already know this.

Dereferencing the resulting pointer gives you an lvalue of type char[3][4], which in turn decays to pointer of type char(*)[4] equal to &arr[1][0]. Adding 1 to that pointer increments it by 1*sizeof(char[4]), after that it's equal to &arr[1][1].

Dereferencing the resulting pointer gives you an lvalue of type char[4]. It also decays to a pointer of type char * equal to &arr[1][1][0]. After adding 1 to it, its value increases by 1*sizeof(char) and becomes equal to &arr[1][1][1]. Then the final dereference gives you the value of arr[1][1][1].

Upvotes: 4

Related Questions