Reputation: 179
It is clear to me that the C standard forbids (does not define the behavior of) this program, but it is not clear why it has to be this way. Why are the aliasing rules such that one cannot write this program?
#include<stdio.h>
#include<string.h>
#include<stdint.h>
#include<stdalign.h>
#define SIZE 512
unsigned char buffer[SIZE];
size_t free_slot = 0;
void* alloc(const size_t bytes, const size_t alignment)
{
const uintptr_t start = (uintptr_t)(buffer+free_slot);
const size_t adjust = (size_t)(start % alignment);
const size_t placement = free_slot + adjust;
const size_t next_free_slot = placement + bytes;
printf("start=%ld\n",start);
printf("adjust=%ld\n",adjust);
printf("placement=%ld\n",placement);
printf("next_free_slot=%ld\n",next_free_slot);
if(SIZE < next_free_slot) return NULL;
free_slot = next_free_slot;
return buffer+placement;
}
struct thing {
uint64_t x;
uint64_t y;
};
int main()
{
int* p1 = alloc(sizeof(int),alignof(int));
printf("--------------\n");
printf("alignof(struct thing)=%ld\n",alignof(struct thing));
printf("--------------\n");
struct thing* p2 = alloc(sizeof(struct thing),alignof(struct thing));
*p1 = 143;
memcpy(p2,&(struct thing){1,2},sizeof(struct thing));
printf("%d\n",*p1);
printf("%ld\n",p2->x);
return 0;
}
Would it be possible to amend the standard to permit such a program or is this a hopeless endeavor?
Upvotes: 8
Views: 416
Reputation: 81115
The C Standard does not "forbid" such constructs in Conforming C Programs that make no claim of being Strictly Conforming C Programs. It allows implementations to process code in ways that would render such programs meaningless in cases where doing so would benefit their users (e.g. by allowing useful optimizations), and makes no effort to forbid implementations from doing so in cases where that would be detrimental to their users, because they never imagined that compiler writers would process programs in gratuitously meaningless fashion and use the fact that the Standard allows them to do so as an excuse to claim that the programs were "broken".
The C Standard explicitly acknowledges that implementations may offer stronger memory semantics than mandated by the Standard (N1570 5.1.2.3p9: "EXAMPLE 1 An implementation might define a one-to-one correspondence between abstract and actual semantics: at every sequence point, the values of the actual objects would agree with those specified by the abstract semantics. The keyword volatile would then be redundant.") Since the defined behaviors of such an implementation would be indistinguishable from one that didn't define such a correspondence, such a specification would be meaningless if it didn't serve to define behavior in situations that would otherwise invoke Undefined Behavior.
Further, an important thing to note about the Standard is that a failure to define the behavior of a construct does not imply a consensus agreement that no compilers should be expected to process it meaningfully, but merely that there wasn't a consensus agreement to require that all compilers process the construct meaningfully even on platforms where the cost of offering any behavioral guarantee consistent with sequential program execution would exceed any benefit such a guarantee might offer to a programmer.
The notion that UB was intended to require programmers to jump through hoops to avoid various situations at all costs, even on platforms that could cheaply offer useful behavioral guarantees, is a modern notion which directly contradicts the intentions of the Standard's authors as documented in the published Rationale document (search C99 Rationale).
Upvotes: 1
Reputation: 2312
Would it be possible to amend the standard to permit such a program or is this a hopeless endeavor?
The C Standard (ISO/IEC 9899) is maintained by ISO/IEC JTC1/SC22/WG14 - participation of which is open to anyone; nominations are made via your appropriate National Body (eg BSI in the UK)
Once a participant in the Working Group, you are free to submit proposals to change the Standard... if you can gain enough support within the WG, it is included in the next draft.
But any such proposal would need to address all the consequences, side-effects or any unwanted outcomes - and remember that WG14 has a Prime Directive of "Do not break existing code" (no matter how already broken that code may be).
The draft then goes through the formal approvals process - formal votes at Committee Draft, then Draft International Standard and finally Final Draft International Standard.
So, if you have a good solid argument for change, it is possible to change the standard; however, I wouldn't hold out much hope in changing this!
Disclaimer: I am a UK delegate to WG14 (as well as the MISRA liaison)
Upvotes: 3
Reputation: 223689
As others have mentioned, strict aliasing allows for certain optimization to be made. And given that these optimizations are useful, the standard committee is unlikely to remove it.
That being said, particular implementations do have methods of getting around this. In particular, gcc has the malloc
attribute. From the GCC documentation:
malloc malloc (deallocator) malloc (deallocator, ptr-index)
Attribute malloc indicates that a function is malloc-like, i.e., that the pointer P returned by the function cannot alias any other pointer valid when the function returns, and moreover no pointers to valid objects occur in any storage addressed by P. In addition, the GCC predicts that a function with the attribute returns non-null in most cases.
Independently, the form of the attribute with one or two arguments associates deallocator as a suitable deallocation function for pointers returned from the malloc-like function. ptr-index denotes the positional argument to which when the pointer is passed in calls to deallocator has the effect of deallocating it.
So if you compile with gcc and declare your function like this:
void* alloc(const size_t bytes, const size_t alignment) __attribute__((malloc))
Then you can safely use the returned memory as though it was returned by malloc
, and strict aliasing can still be used elsewhere in the program.
Upon further reflection, given that several compilers support attributes of some type, it would make sense to standardize many of these to give application developers more control in how code can be compiled. Given that C has been historically viewed as a "portable assembler" but standards have caused a divergence from that, putting support for such low-level behavior in the standard would probably be well-received.
Upvotes: 5
Reputation: 213276
It is clear to me that the C standard forbids this program
Not really, it doesn't cover what will happen if you type pun from a character array into a struct - it is undefined behavior, since it violates a "shall" in C17 6.5/7, but not a constraint.
Regarding all the "strict aliasing sucks am I right?" rants... yes and no. The original purpose of these rules was to disallow wild and crazy conversions. The C99 rationale 5.10 chapter 6.5/35 shows this example:
int a; void f(int * b) { a = 1; *b = 2; g(a); }
It is tempting to generate the call to g as if the source expression were
g(1)
, butb
might point toa
, so this optimization is not safe. On the other hand, considerint a; void f( double * b ) { a = 1; *b = 2.0; g(a); }
Again the optimization is incorrect only if
b
points toa
. However, this would only have come about if the address of a were somewhere cast todouble*
. The C89 Committee has decided that such dubious possibilities need not be allowed for.
This is the original rationale and C99 extended the unclear rules of C89 a bit with the introduction of effective type, for better and worse. The rules are still very much unclear, but the original intention is to allow compilers to not having to make weird assumptions as the above. So far it is a perfectly sensible assumption that compilers should be allowed to make.
Unfortunately somewhere in the early 2000s, some compilers most notably gcc decided to abuse this in order to perform optimizations. Suddenly you couldn't do things like uint8_t arr[2]; ... *(uint16_t*)arr
because that's strictly speaking a strict aliasing violation. Until C99 compilers had generated sensible code without such optimizations, but past C99 some chose to go haywire. The situation has improved somewhat over the years but we can still not rely on compilers to generate "the expected" code in my little uint16_t*
conversion above.
The number of exceptions to the strict aliasing rules in C17 6.5/7 leaves a lot to be desired. For example it is perfectly sensible to type pun between various unsigned integer types - anyone who's done hardware-related programming understands this. But this isn't allowed.
And as another example there's no mentioning what will happen with type qualifiers - nobody in the whole world seems to be able to answer this: What rules are there for qualifiers of effective type? - I have no idea of what rules there are myself.
It's unclear how to use arrays in relation to effective type... the list goes on. There's numerous Defect Reports about various details of these rules but they haven't been improved.
As for if your program contains any strict aliasing violations and how to fix it:
unsigned char buffer[SIZE];
has the effective type (array of) unsigned char
.const uintptr_t start = (uintptr_t)(buffer+free_slot);
is fine assuming that you don't end up with misalignment, but that's a separate issue.int
or a struct type etc, there is a strict aliasing violation, since this is not one of the allowed exceptions in the list 6.5/7. The other way around - going from a larger type and accessing byte by byte with character type pointers would be fine.So to fix it you have to make something like this, for the int
example:
typedef union
{
int i;
unsigned char bytes[sizeof(int)];
} intalias_t;
Now you can do:
intalias_t* p1 = alloc(sizeof(int),alignof(int));
(*p1).i = 143; // well-defined
Because (*p1).i
is "an lvalue expression that" is "an aggregate or union type that includes" "a type compatible with the effective type of the object". That is, the union contains a character type array which is (supposedly) compatible with the effective type which is also a character type. "Supposedly" since the rules are muddy when it comes to array access. And if your original array or the one in the union contained a type qualifier, nobody knows(?) what will happen.
When in doubt/as a rule of thumb, use -fno-strict-aliasing
.
Upvotes: 9
Reputation: 180048
Why is the aliasing rules such that one cannot write this program?
The strict-aliasing rule, which has been in every version of ISO C published to date, does not say that you cannot write the program, or even that a C implementation cannot accept it and execute it with the effect you seem to want. Rather, this is one of the comparatively many places where the specification holds that the program, though syntactically correct and satisfying (I think) all language constraints, has undefined behavior.
There are various reasons for the spec to leave program behavior undefined under some circumstances, or, as in this case, to explicitly specify that it is undefined. In the case of the strict aliasing rule, the rationale document for C99 (there is no such document for more recent versions of the specification) speaks to this decision:
The types of lvalues that may be used to access an object have been restricted so that an optimizer is not required to make worst-case aliasing assumptions
(p. 59)
That full discussion is too much to quote here (it's a bit more than a full page of the document), but you may find it of interest.
Would it be possible to amend the standard to permit such a program or is this a hopeless endeavor?
The ISO has a working group devoted to maintaining the language specification, and it releases revisions from time to time. In principle, then, it is possible that such a change could be made. In practice, it is doubtful that this particular change would be accepted because it would have wide-ranging impact for comparatively small gain.
Upvotes: 5