Reputation: 122453
There are four special non-alphabet characters that need to be escaped in C/C++: the single quote \'
, the double quote \"
, the backslash \\
, and the question mark \?
. It's apparently because they have special meanings. '
for single char
, "
for string literals, \
for escape sequences, but why is ?
one of them?
I read the table of escape sequences in a textbook today and I realized that I've never escaped ?
before and have never encountered a problem with it. Just to be sure, I tested it under GCC:
#include <stdio.h>
int main(void)
{
printf("question mark ? and escaped \?\n");
return 0;
}
And the C++ version:
#include <iostream>
int main(void)
{
std::cout << "question mark ? and escaped \?" << std::endl;
return 0;
}
Both programs output: question mark ? and escaped ?
So I have two questions:
\?
one of the escape sequence characters??
work fine? There's not even a warning.The more interesting fact is that the escaped \?
can be used the same as ?
in some other languages as well. I tested in Lua/Ruby, and it's also true even though I didn't find this documented.
Upvotes: 49
Views: 12080
Reputation: 122453
Why is
\?
one of the escape sequence characters?
Because it is special. The answer leads to Trigraph, and the C/C++ preprocessor replaces the following three-character sequences with the corresponding single character. (C11 §5.2.1.1 and C++11 §2.3)
Trigraph: ??( ??) ??< ??> ??= ??/ ??' ??! ??-
Replacement: [ ] { } # \ ^ | ~
A trigraph is nearly useless now, and it is mainly used for obfuscation purposes. Some examples can be seen in IOCCC.
GCC doesn't support trigraph by default and will warn you if there's a trigraph in the code, unless the option -trigraphs
3 is enabled. Under the -trigraphs
option, the second \?
is useful in the following example:
printf("\?\?!\n");
Output would be |
if ?
is not escaped.
For more information on trigraphs, see Cryptic line "??!??!" in legacy code
Why does non-escaping
?
work fine. There's not even a warning.
Because ?
(and double quote "
) can be represented by themselves by the standard:
C11 §6.4.4.4 Character constants Section 4
The double-quote
"
and question-mark?
are representable either by themselves or by the escape sequences\"
and\?
, respectively, but the single-quote'
and the backslash\
shall be represented, respectively, by the escape sequences\'
and\\
.
Similar in C++:
C++11 §2.13.2 Character literals Section 3
Certain nongraphic characters, the single quote
’
, the double quote"
, the question mark?
, and the backslash\
, can be represented according to Table 6. The double quote"
and the question mark?
, can be represented as themselves or by the escape sequences\"
and\?
respectively, but the single quote’
and the backslash\
shall be represented by the escape sequences\’
and\\
respectively. If the character following a backslash is not one of those specified, the behavior is undefined. An escape sequence specifies a single character.
Upvotes: 52