Reputation: 215090
I came across some weird-looking code. It doesn't even look like C, yet to my surprise it compiles and runs on my C compiler. Is this some non-standard extension to the C language and if so, what is the reason for it?
??=include <stdio.h>
int main()
??<
const char arr[] =
??<
0xF0 ??! 0x0F,
??-0x00,
0xAA ??' 0x55
??>;
for(int i=0; i<sizeof(arr)/sizeof(*arr); i++)
??<
printf("%X??/n", (unsigned char)arr??(i??));
??>
return 0;
??>
Output:
FF
FF
FF
Upvotes: 7
Views: 1161
Reputation: 107
This is an obfuscated code conforming to the 1989 ANSI C standard (which formally defined “??”-based trigraphs) and later standards. The reason behind choice of 0xF0 ??! 0x0F
etc. is evidently obfuscation or compiler testing, because no incentive to use bitwise OR on trivial literals during initialization can be foreseen for a production code.
The reasons formulated by the C89 Committee for introduction of the trigraphs were motivated by portability. The problem with # [ ] { } \ | ~ ^
was not confined to ISO 646 encodings and should be understood in the context of diverse computing platforms of 1980s, including Soviet computers (some of which used KOI-7), IBM mainframes (which used various dialects of EBCDIC), Commodore computers (which used PETSCII), Atari computers (which used ATASCII), etc. Now we can deem that the decision came late.
Upvotes: -3
Reputation: 215090
The code is fully standard compliant to any version of the C standard. The ??
mechanism is called trigraphs and was introduced to C to allow an alternative way of printing certain symbols. It looks like the program was written as a demonstration of various trigraph sequences.
Back in the days, many computers and their keyboards were based on an old symbol table called ISO 646 which didn't contain all symbols used in the C language, such as \ { } [ ]
. This made it impossible for programmers from some countries to even write C, because their national keyboard layout lacked the necessary symbols. Instead of remaking the keyboards and symbol tables, the C language was changed.
Therefore trigraphs were introduced. Today they are considered a completely obsolete feature and it is not recommended to use them.[1] GCC will for example give you a warning if you use them. Still, they remain in the C standard for backwards-compatibility and all C compilers must support them.
The existing trigraph sequences are (C11 5.2.1.1 Trigraph sequences):
??= #
??( [
??/ \
??) ]
??' ^
??< {
??! |
??> }
??- ~
The left column is the trigraph sequence and the right column is its meaning.
EDIT: Those interested in the historical decisions can read about it themselves in the C rationale v5.10, chapter 5.2.1.1.
[1]: C23 removed trigraphs from the language standard entirely.
Upvotes: 15