Reputation: 65
I have seen quite a few of following code (abstract example):
char* byteBlockPtr;
long* alignedPtr = NULL;
/* ... */
/* aligning pointer by long boundary */
while (!ALIGNED(byteBlockPtr))
{
byteBlockPtr++;
}
alignedPtr = (long*)byteBlockPtr;
/* ... */
/* do stuff with memory */
alignedPtr++; /* go to next block */
/* ... */
And this is quite understandable with the reason being that casting from char pointer to a more strict pointer type (in this case pointer to long) requires that the alignment is the same.
Does the same apply to void pointers?
Are there any general rules that one must follow in order not to break the alignment of pointers if, say, one is writing his own memset for example?
What is the connection between pointer aliasing and alignment if any in regards to char and void pointers as well as others? If for example void pointer is implicitly converted to any other pointer type as per standard, does this mean that it is guaranteed that alignment requirements are met as well?
P.S. Sorry in advance for more than 1 question, but apparently there is a gap in my knowledge, and I have no idea how to narrow it down.
Upvotes: 2
Views: 931
Reputation: 81217
The C Standard allows obtuse implementations to do anything they want if an object of any type is modified using a pointer of any type other than the specific ones listed in the Standard, regardless of whether the compiler would have reason to expect that the object is being modified. It doesn't matter whether the pointers would be properly aligned or not. According to the rationale, the rule exists so that given code like:
float f;
void hey(int *p)
{
f=1.0f;
*p=6;
f+=1.0f;
}
a compiler won't have to pessimistically assume that p
might hold the address of f
and thus write f
before the pointer assignment and read it afterward. In a case like that the compiler would have no reason to expect that a write to p
would affect f
, and thus no reason to expect that the redundant store and load would serve any purpose.
While there's no evidence that the authors of the Standard intended that compiler writers should be so obtuse as to ignore situations where aliasing is obvious, some compiler writers, including those involved with gcc, interpret the lack of a mandate as an indication that they should ignore obvious aliasing when doing so will facilitate more "efficient" code, without regard for whether the code in question will actually be useful.
On any platform which defines a means of checking whether a pointer is suitably aligned for a given type, converting the pointer to char*, incrementing it unless or until it is suitably aligned, and then converting it to that other type will yield a pointer to that other type. Unfortunately, while C11 defines a standard means of ensuring that an object of one type is located in a fashion meeting the alignment requirements of another, it does not define a standard means by which code can make use of such alignment without running afoul of aliasing issues.
If code only has to run on non-obtuse compilers, I would suggest that casting from one type to another and accessing as the latter type should be reliable provided that operations using the new type are done with a pointer that was cast from the old type to the new type after the last access using the old type, and all operations using the cast pointer are done before the next access using the old type. Most code which uses "chunking optimizations" fits that pattern, and it's an easy pattern for compilers to support without needing to make excessively-pessimistic assumptions (if code casts a pointer from type T1* to T2* and then writes to it, an assumption that such an operation is likely to affect an object of type T1 may be pessimistic, but in most cases it will also be correct).
Unfortunately, because the Standard has yet to mandate compiler recognition of aliasing even in cases where it's obvious, and the authors of gcc show no interest in such recognition absent a mandate, there's no way to safely use chunking optimizations in gcc without either using non-standard gcc-specific extensions or else using the -fno-strict-aliasing
flag. Getting good performance while using that flag will require learning to use the restrict
qualifier, but using chunking to speed up hot loops and using restrict
to minimize the performance impact of -fno-strict-aliasing
seems like a better approach than using slow non-chunked loops. Note also that gcc will often process code which uses chunking optimizations correctly with or without the flag, but the authors of gcc consider any correct behavior when such code is compiled without the flag as "accidental" and have no aversion to "fixing" [i.e. breaking] such code without warning.
BTW, if one wants to use chunking optimizations in fully-conformant fashion, the only ways to accomplish that are (1) use byte-oriented code and hope the optimizer somehow figures out how to replace it with a chunked version, or (2) use memcpy/memmove to load word-sized variables from other storage and hope the optimizer manages to replace them with sane code. For example, if one has a 64-bit aligned pointer to a bunch of uint16_t values and wishes to compute the ones' complement of them, one could use:
void flip_quad16s(uint16_t *p, int num_quads)
{
uint64_t *pp = (uint64_t*)p;
union {
uint64_t dw;
uint16_t hw[4];
} u;
for (int i=0; i<num_quads; i++)
{
memcpy(u.hw, pp, 8);
u.dw = ~u.dw;
/* Note that if p actually identifies something which has no declared
type but will be used as uint16_t, we must make sure that memcpy
uses that as a source type */
memcpy(pp++, u.hw, 8);
}
}
Of course, that will require the compiler to presume that p might alias anything of any type, which may prevent even a perfect optimizing compiler from achieving result as good as was a non-obtuse compiler could have achieved with code that took a uint16_t, cast it to uint64_t, and then worked with that, e.g.
void flip_quad16s(uint16_t *p, int num_quads)
{
uint64_t *pp = (uint64_t*)p;
for (int i=0; i<num_quads; i++)
pp[i] = ~pp[i];
}
It should be much easier for a sane compiler to turn the latter function into optimal code that will invert a bunch of uint16_t values than for any compiler to do likewise with the former function, especially if it's called within a loop that makes use of other types, since the use of memcpy would force a compiler to acknowledge potential aliasing of all types, rather than just uint16_t and uint64_t.
Upvotes: 6