Is the following program valid? (In the sense of being well-defined by the ISO C standard, not just happening to work on a particular compiler.) struct foo { int a, b, c; }; int f(struct foo *p) { // should return p->c char *q = ((char *)p) + 2 * sizeof(int); return *((int *)q); } It follows at least some of the rules for well-defined use of pointers: The value being loaded, is of the same type that was stored at the address. The provenance of the calculated pointer is valid, being derived from a valid pointer by adding an offset, that gives a pointer still within the original storage instance. There is no mixing of element types within the struct, that would generate padding to make an element offset unpredictable. But I'm still not sure it's valid to explicitly calculate and use element pointers that way.

Reputation: 33475

Is it valid to calculate element pointers by explicit arithmetic?

Is the following program valid? (In the sense of being well-defined by the ISO C standard, not just happening to work on a particular compiler.)

struct foo {
  int a, b, c;
};

int f(struct foo *p) {
  // should return p->c
  char *q = ((char *)p) + 2 * sizeof(int);
  return *((int *)q);
}

It follows at least some of the rules for well-defined use of pointers:

The value being loaded, is of the same type that was stored at the address.
The provenance of the calculated pointer is valid, being derived from a valid pointer by adding an offset, that gives a pointer still within the original storage instance.
There is no mixing of element types within the struct, that would generate padding to make an element offset unpredictable.

But I'm still not sure it's valid to explicitly calculate and use element pointers that way.

Upvotes: 3

Answers (3)

supercat

Reputation: 81247

I think it likely that at least some authors of the Standard intended to allow a compiler given something like:

struct foo { unsigned char a[4], b[4]; } x;
int test(int i)
{
  x.b[0] = 1;
  x.a[i] = 2;
  return x.b[0];
}

to generate code that would always return 1 regardless of the value of i. On the flip side, I think it is extremely like nearly all of the Committee would have intended that a function like:

struct foo { char a[4], b[4]; } x;
void put_byte(int);

void test2(unsigned char *p, int sz)
{
  for (int i=0; i<sz; i++)
    put_byte(p[i]);
}

be capable of outputting all of the bytes in x in a single invocation.

Clang and gcc will assume that any construct which applies the [] operator to a struct or union member will only be used to access elements of that member array, but the Standard defines the behavior of arrayLValue[index] as equivalent to (*((arrayLValue)+index)), and would define the address of x.a's first element, which is an unsigned char*, as equivalent to the address of x, cast to that type. Thus, if code calls test2((unsigned char*)x), the expression p[i] would be equivalent to x.a[i], which clang and gcc would only support for subscripts in the range 0 to 3.

The only way I see of reading the Standard as satisfying both viewpoints would be to treat support for even the latter construct as a "quality of implementation" issue outside the Standard's jurisdiction, on the assumption that quality implementations would support constructs like the latter with or without a mandate, and there was thus no need to write sufficiently detailed rules to distinguish those two scenarios.

Upvotes: 0

Davislor

Reputation: 15144

The C standard allows there to be arbitrary padding between elements of a struct (but not at the beginning of one). Real-world compilers won’t insert padding into a struct like that one, but the DeathStation 9000 is allowed to. If you want to do that portably, use the offsetof() macro from <stddef.h>.

*(int*)((char*)p + offsetof(foo, c))

is guaranteed to work. A difference, such as offsetof(foo,c) - offsetof(foo, b), is also well-defined. (Although, since offsetof() returns an unsigned value, it’s defined to wrap around to a large unsigned number if the difference underflows.)

In practice, of course, use &p->c.

An expression like the one in your original question is guaranteed to work for array elements, however, so long as you do not overrun your buffer. You can also generate a pointer one past the end of an array and compare that pointer to a pointer within the array, but dereferencing such a pointer is undefined behavior.

Upvotes: 1

MK.

Reputation: 34587

C is a low level programming language. This code is well-defined but probably not portable. It is not portable because it makes assumptions about the layout of the struct. In particular, you might run into fields being 64-bit aligned on a 64bit platform where in is 32 bit. Better way of doing it is using the offsetof marco.

Upvotes: 4

Is it valid to calculate element pointers by explicit arithmetic?

Answers (3)

Related Questions