MsA
MsA

Reputation: 2979

Address and contents of array variable vs pointer in C

I know array in C does essentially behaves like a pointer except at some places like (sizeof()). Apart from that pointer and array variables dont differ except in their declaration.

For example consider the two declarations:

int arr[] = {11,22,33};
int *arrptr = arr; 

Now here is how they behave same:

printf("%d  %d", arrptr[0], arr[0]);  //11  11
printf("%d  %d", *arrptr, *arr);      //11  11

But here is one more place I found they differ:

//the outputs will be different on your machine
printf("%d  %d", &arrptr, &arr);   //2686688  2686692 (obviously different address)
printf("%d  %d", arrptr, arr);     //2686692  2686692 (both same)

Here the issue is with last line. I understand that arrptr contains the address of arr. Thats why the first address printed in last line is 2686692. I also understand that logically the address (&arr) and contents (arr) of arr should be same unlike arrptr. But then whats exactly that which (internally at implementation level) that makes this happen?

Upvotes: 1

Views: 178

Answers (2)

John Bode
John Bode

Reputation: 123448

I know array in C does essentially behaves like a pointer except at some places like (sizeof()). Apart from that pointer and array variables dont differ except in their declaration.

This is not quite true. Array expressions are treated as pointer expressions in most circumstances, but arrays and pointers are completely different animals.

When you declare an array as

T a[N];

it's laid out in memory as

   +---+ 
a: |   | a[0]
   +---+
   |   | a[1]
   +---+
   |   | a[2]
   +---+
    ...
   +---+
   |   | a[N-1]
   +---+

One thing immediately becomes obvious - the address of the first element of the array is the same as the address of the array itself. Thus, &a[0] and &a will yield the same address value, although the types of the two expressions are different (T * vs. T (*)[N]), and the value may possibly adjusted based on type.

Here's where things get a little confusing - except when it is the operand of the sizeof or unary & operator, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.

This means the expression a also yields the same address value as &a[0] and &a, and it has the same type as &a[0]. Putting this all together:

Expression        Type        Decays to         Value
----------        ----        ---------         -----
         a        T [N]       T *               Address of a[0]
        &a        T (*)[N]    n/a               Address of a
        *a        T           n/a               Value of a[0]
      a[i]        T           n/a               Value of a[i]
     &a[i]        T *         n/a               Address of a[i]
  sizeof a        size_t      n/a               Number of bytes in a

So why does this conversion rule exist in the first place?

C was derived from an earlier language called B (go figure). B was a typeless language - everything was treated as basically an unsigned integer. Memory was seen as a linear array of fixed-length "cells". When you declared an array in B, an extra cell was set aside to store the offset to the first element of the array:

  +---+
a:|   | ----+
  +---+     |
   ...      |
    +-------+
    |
    V
  +---+
  |   | a[0]
  +---+
  |   | a[1]
  +---+
   ...
  +---+
  |   | a[N-1]
  +---+

The array subscript operation a[i] was defined as *(a + i); that is, take the offset value stored in a, add i, and dereference the result.

When Ritchie was designing C, he wanted to keep B's array semantics, but couldn't figure out what to do with the explicit pointer to the first element, so he got rid of it. Thus, C keeps the array subscripting definition a[i] == *(a + i) (given the address a, offset i elements from that address and dereference the result), but doesn't set aside space for a separate pointer to the first element of the array - instead, it converts the array expression a to a pointer value.

This is why you see the same output when you print the values of arr and arrptr. Note that you should print out pointer values using the %p conversion specifier and cast the argument to void *:

printf( "arr = %p, arrptr = %p\n", (void *) arr, (void *) arrptr );

This is pretty much the only place you need to explicitly cast a pointer value to void * in C.

Upvotes: 2

Blagovest Buyukliev
Blagovest Buyukliev

Reputation: 43498

When the unary & operator is applied to an array, it returns a pointer to an array. When applied to a pointer, it returns a pointer to a pointer. This operator together with sizeof represent the few contexts where arrays do not decay to pointers.

In other words, &arrptr is of type int **, whereas &arr is of type int (*)[3]. &arrptr is the address of the pointer itself and &arr is the beginning of the array (like arrptr).

The subtle part: arrptr and &arr have the same value (both point to the beginning of the array), but are of a different type. This difference will show if you do any pointer arithmetic to them – with arrptr the implied offset will be sizeof(int), whereas with &arr it will be sizeof(int) * 3.

Also, you should be using the %p format specifier to print pointers, after casting to void *.

Upvotes: 3

Related Questions