Leandros
Leandros

Reputation: 16825

Valid programs in C89, but not in C99

Are there features / semantics introduced, or removed, in C99 which would make a well defined program written in C89 either

My findings so far, concerning plainly invalid programs:

Subtle changes, making the same code having different semantics:

What have I missed?

Upvotes: 19

Views: 1758

Answers (2)

VonC
VonC

Reputation: 1329672

As an element of contrast and comparison, the git/git codebase remains strictly conform to C89 and does not use C99 initializers, or features from newer C standard.
This is detailed in Git 2.23 (Q3 2019) in Git Coding Guidelines.

This answer illustrates post-C89 feature that might be compatible with C89.

See commit cc0c429 (16 Jul 2019) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit fe9dc6b, 25 Jul 2019)

CodingGuidelines: spell out post-C89 rules

Even though we have been sticking to C89, there are a few handy features we borrow from more recent C language in our codebase after trying them in weather balloons and saw that nobody screamed.

Spell them out.

While at it, extend the existing variable declaration rule a bit to read better with the newly spelled out rule for the for loop.

The coding guidelines now include:

You should not use features from newer C standard, even if your compiler groks them.

There are a few exceptions to this guideline:

  • since early 2012 with e1327023ea (Git v1.7.9.2), we have been using an enum definition whose last element is followed by a comma.
    This, like an array initializer that ends with a trailing comma, can be used to reduce the patch noise when adding a new identifer at the end.

  • since mid 2017 with cbc0f81d (Git v2.15.0-rc0), we have been using designated initializers for struct (e.g. "struct t v = { .val = 'a' };")
    There are certain C99 features that might be nice to use in our code base, but we've hesitated to do so in order to avoid breaking compatibility with older compilers.
    But we don't actually know if people are even using pre-C99 compilers these days.
    If this patch can survive a few releases without complaint, then we can feel more confident that designated initializers are widely supported by our user base.
    It also is an indication that other C99 features may be supported, but not a guarantee (e.g., gcc had designated initializers before C99 existed).

  • since mid 2017 with 512f41cf (Git v2.15.0-rc0), we have been using designated initializers for array (e.g. "int array[10] = { [5] = 2 }").
    This is another test balloon to see if we get complaints from people whose compilers do not support designated initializer for arrays.
    These used to be forbidden, but we have not heard any breakage report, and they are assumed to be safe.

  • Variables have to be declared at the beginning of the block, before the first statement (i.e. -Wdeclaration-after-statement).

  • Declaring a variable in the for loop "for (int i = 0; i < 10; i++)" is still not allowed in this codebase.

Upvotes: -1

supercat
supercat

Reputation: 81337

There are a lot of programs which would have been considered valid under C89, prior to the publication of C99, which some people insist were never valid. C89 includes a rule that requires that an object of any type may only be accessed using a pointer of that type, a related type, or a character type. Prior to the publication of C99, this rule was generally interpreted as applying only to "named" objects (variables of static or automatic duration which are accessed directly by name), and only in situations where the object in question didn't have its address taken immediately before it was used as a different pointer type. Such interpretation was motivated by a number of factors:

  1. One of the stated goals of the Standard was to fit with what existing compilers and programs were doing, and while it would have been rare for existing programs to access discrete named variables using pointers of different types other than in cases where the variable's address was taken immediately before such use, many other usages of pointer type punning were quite common.

  2. The rationale for the Standard includes as its sole example a function which receives a pointer of one primitive type to write a global variable of another primitive type in such a way that a compiler would have no particular reason to expect aliasing. Being able to keep global variables in registers is clearly a useful optimization, and the stated purpose of the rule is to allow such optimizations in cases where a compiler would have no reason to expect aliasing to occur. Outlawing constructs like like (int*)&foo=23; does nothing to aid such optimizations, since the fact that code is taking foo's address and dereferencing it should make it abundantly clear to any compiler that isn't being deliberately obtuse that the code is going to modify foo.

  3. There are many kinds of code which require semantically the ability to use memory bits as various types, and nothing in the Standard indicate that the rules were intended to make programmers jump through hoops (e.g. by using memcpy) to achieve semantics that could have been easily obtained in the absence of the rules, especially considering that using memcpy would prevent the compiler from keeping global variables in registers across the pointer accesses (thus defeating the purpose for which the rules were written in the first place).

  4. If structure types V and W have a common initial sequence, U is any union type containing both, and p is a V* which identifies the V within a U, then (W*)(U*)p may be used to access those common members, and will be equivalent to (W*)p. Unless a compiler could show that p couldn't possibly be a pointer to a member of some union containing W, it would be required to allow (W*)p to access the common members; it was more helpful to simply treat such common member access as being legitimate regardless of whether or where U might exist than to search for excuses to deny it.

  5. Nothing in the C89 rules makes clear how the "type" of a region of allocated storage is defined, or how storage which holds things of one type that are no longer needed might be re-purposed to hold things of another.

  6. Keeping track of registers allocated to named variables was easier than keeping track of registers allocated to other pointer exceptions, and code which was interested in minimizing the number of loads and stores via pointers would often copy things to named variables and work on them there.

C99 added "effective type" rules which are explicitly applicable to allocated storage. Some people insist those were merely "clarifications" of rules which already existed in C89, but for the above reasons I find that viewpoint untenable. It's fashionable to claim that the only reasons compilers didn't apply aliasing rules to unnamed objects are #5 and #6, but objections #1-#4 are equally significant (and continue to apply to C99 just as much as C89). Still, since C99 added the effective type rules, many constructs which would have been treated as legitimate by most common interpretations of the C89 rules are clearly forbidden.

Upvotes: 4

Related Questions