Reputation: 1955
In C one can pass unicode characters to printf() like this:
printf("some unicode char: %c\n", "\u00B1");
But the problem is that on POSIX compliant systems `char' is always 8 bits and most of UTF-8 character such as the above are wider and don't fit into char and as the result nothing is printed on the terminal. I can do this to achieve this effect however:
printf("some unicode char: %s\n", "\u00B1");
%s placeholder is expanded automatically and a unicode character is printed on the terminal. Also, in a standard it says:
If the hexadecimal value for a universal character name is less than 0x20 or in the range 0x7F-0x9F (inclusive), or if the universal character name designates a character in the basic source character set, then the program is illformed.
When I do this:
printf("letter a: %c\n", "\u0061");
gcc says:
error: \u0061 is not a valid universal character
So this technique is also unusable for printing ASCII characters. In this article on Wikipedia http://en.wikipedia.org/wiki/Character_(computing)#cite_ref-3 it says:
A char in the C programming language is a data type with the size of exactly one byte, which in turn is defined to be large enough to contain any member of the basic execution character set and UTF-8 code units.
But is this doable on POSIX systems?
Upvotes: 1
Views: 1350
Reputation: 215407
Use of universal characters in byte-based strings is dependent on the compile-time and run-time character encodings matching, so it's generally not a good idea except in certain situations. However they work very well in wide string and wide character literals: printf("%ls", L"\u00B1");
or printf("%lc", L'\00B1');
will print U+00B1 in the correct encoding for your locale.
Upvotes: 3