megamonium
megamonium

Reputation: 483

Pointer arithmetic - how does the compiler determine the number of bytes to increment?

Consider the following piece of code.

#include <iostream>

int main(){
  int a[] = {1,2,3,4,5};
  int b = 5;
  std::cout << a[b] << std::endl;
  std::cout << b[a] << std::endl;
}

I understand that a[b] and b[a] are identical, as specified by the standard:

Except where it has been declared for a class (13.5.5), the subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2)). Because of the conversion rules that apply to +, if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th member of E1. Therefore, despite its asymmetric appearance, subscripting is a commutative operation.

However, I still don't quite understand. The compiler does address arithmetic in bytes. Since an int takes up 4 bytes, both a[b] and b[a] are translated into *(a + b * 4). My question is: how does the compiler determine that the correct translation is *(a + b * 4), instead of *(b + a * 4)? When the compiler is given an expression in the form of E1[E2], the compiler can translate it into either *(E1 + E2 * 4), or *(E2 + E1 * 4) - how does the compiler know which one is the correct choice?

Upvotes: 3

Views: 263

Answers (2)

dxiv
dxiv

Reputation: 17668

Imagine a language C±± which is just like C++ except it does not have any notion of array indexing and no subscript operator []. All the rest of C++ rules and definitions still apply, though.

Except where it has been declared for a class (13.5.5), the subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2)).

What the C++ standard says here can be loosely read as: the C++ compiler first translates all subscript expressions E1[E2] into *((E1)+(E2)). The result is valid C±± code, which is then further evaluated according to the C±± rules.

This means that a[b] and b[a] get translated to *(a + b) and *(b + a), respectively, which are identical since addition is commutative in C++ (and therefore C±±).

Upvotes: -2

Sam Varshavchik
Sam Varshavchik

Reputation: 118445

It is not the size of the object that's the determinant type. It's the actual, complete type of the object.

The compiler knows the actual type of every object. The compiler knows not just that a is four bytes (or eight bytes on a 64-bit system), but it's a pointer and b is an integral type. This is a fundamental aspect of C++: the type of every object is, and must be, known at compile time.

So when a pointer type is added to an integer type, the integer value gets multiplied by the size of the type being pointed to. It doesn't matter which one is on left side and the right side of the+ operator. If one operand is a pointer, and the other one is an integer type, this is what happens in C++.

Upvotes: 5

Related Questions