rwallace
rwallace

Reputation: 33475

Is it valid to calculate element pointers by explicit arithmetic?

Is the following program valid? (In the sense of being well-defined by the ISO C standard, not just happening to work on a particular compiler.)

struct foo {
  int a, b, c;
};

int f(struct foo *p) {
  // should return p->c
  char *q = ((char *)p) + 2 * sizeof(int);
  return *((int *)q);
}

It follows at least some of the rules for well-defined use of pointers:

But I'm still not sure it's valid to explicitly calculate and use element pointers that way.

Upvotes: 3

Views: 72

Answers (3)

supercat
supercat

Reputation: 81247

I think it likely that at least some authors of the Standard intended to allow a compiler given something like:

struct foo { unsigned char a[4], b[4]; } x;
int test(int i)
{
  x.b[0] = 1;
  x.a[i] = 2;
  return x.b[0];
}

to generate code that would always return 1 regardless of the value of i. On the flip side, I think it is extremely like nearly all of the Committee would have intended that a function like:

struct foo { char a[4], b[4]; } x;
void put_byte(int);

void test2(unsigned char *p, int sz)
{
  for (int i=0; i<sz; i++)
    put_byte(p[i]);
}

be capable of outputting all of the bytes in x in a single invocation.

Clang and gcc will assume that any construct which applies the [] operator to a struct or union member will only be used to access elements of that member array, but the Standard defines the behavior of arrayLValue[index] as equivalent to (*((arrayLValue)+index)), and would define the address of x.a's first element, which is an unsigned char*, as equivalent to the address of x, cast to that type. Thus, if code calls test2((unsigned char*)x), the expression p[i] would be equivalent to x.a[i], which clang and gcc would only support for subscripts in the range 0 to 3.

The only way I see of reading the Standard as satisfying both viewpoints would be to treat support for even the latter construct as a "quality of implementation" issue outside the Standard's jurisdiction, on the assumption that quality implementations would support constructs like the latter with or without a mandate, and there was thus no need to write sufficiently detailed rules to distinguish those two scenarios.

Upvotes: 0

Davislor
Davislor

Reputation: 15144

The C standard allows there to be arbitrary padding between elements of a struct (but not at the beginning of one). Real-world compilers won’t insert padding into a struct like that one, but the DeathStation 9000 is allowed to. If you want to do that portably, use the offsetof() macro from <stddef.h>.

*(int*)((char*)p + offsetof(foo, c))

is guaranteed to work. A difference, such as offsetof(foo,c) - offsetof(foo, b), is also well-defined. (Although, since offsetof() returns an unsigned value, it’s defined to wrap around to a large unsigned number if the difference underflows.)

In practice, of course, use &p->c.

An expression like the one in your original question is guaranteed to work for array elements, however, so long as you do not overrun your buffer. You can also generate a pointer one past the end of an array and compare that pointer to a pointer within the array, but dereferencing such a pointer is undefined behavior.

Upvotes: 1

MK.
MK.

Reputation: 34587

C is a low level programming language. This code is well-defined but probably not portable. It is not portable because it makes assumptions about the layout of the struct. In particular, you might run into fields being 64-bit aligned on a 64bit platform where in is 32 bit. Better way of doing it is using the offsetof marco.

Upvotes: 4

Related Questions