Reputation: 905
Does the C standard require that compilers be able to deal with files not encoded as ascii? Specifially, I am wondering whether utf-8 files are standards compliant. Does the answer to the previous question differ between C89, C99 and C11?
Assuming that it is legal to use characters from outside of ASCII in C source files, which usages are legal?
I can think of a few distinct use cases:
Here is an example showing all four:
#ifdef PRINT_©
// Print out the © notice
cont char my©Notice[] = "This program is © 2016 ACME INC";
puts(my©Notice);
#endif
If C allows non-ASCII characters to appear in the above listed usages, are there any restrictions on the code points which may be used?
Keep in mind that this is a question about C standards. I already realize that putting unicode characters into identifiers and macros will make the code more difficult to use.
Upvotes: 1
Views: 631
Reputation: 6887
It's implementation defined, and thus not regulated by the standard.
I know of at least one compiler, namely clang
, that requires the source to be UTF-8. But other compilers might use other requirements, or not allow it.
Since C99, identifiers are allowed to contain multi-byte characters, but before C99 it would be an extension to allow non-basic characters there. C11 expanded the set of allowed characters.
There's some additional restrictions on what characters are allowed in identifiers, and © is not in the list. It's listed in appendix D. These are Unicode points, but that doesn't strictly mean the encoding in the file has to be unicode-based.
Ranges of characters allowed
Ranges of characters disallowed initially
Upvotes: 3