user1042840
user1042840

Reputation: 1955

What's the use of universal characters on POSIX system?

In C one can pass unicode characters to printf() like this:

printf("some unicode char: %c\n", "\u00B1");

But the problem is that on POSIX compliant systems `char' is always 8 bits and most of UTF-8 character such as the above are wider and don't fit into char and as the result nothing is printed on the terminal. I can do this to achieve this effect however:

printf("some unicode char: %s\n", "\u00B1");

%s placeholder is expanded automatically and a unicode character is printed on the terminal. Also, in a standard it says:

If the hexadecimal value for a universal character name is less than 0x20 or in the range 0x7F-0x9F (inclusive), or if the universal character name designates a character in the basic source character set, then the program is illformed.

When I do this:

printf("letter a: %c\n", "\u0061");

gcc says:

error: \u0061 is not a valid universal character

So this technique is also unusable for printing ASCII characters. In this article on Wikipedia http://en.wikipedia.org/wiki/Character_(computing)#cite_ref-3 it says:

A char in the C programming language is a data type with the size of exactly one byte, which in turn is defined to be large enough to contain any member of the basic execution character set and UTF-8 code units.

But is this doable on POSIX systems?

Upvotes: 1

Views: 1350

Answers (1)

R.. GitHub STOP HELPING ICE
R.. GitHub STOP HELPING ICE

Reputation: 215407

Use of universal characters in byte-based strings is dependent on the compile-time and run-time character encodings matching, so it's generally not a good idea except in certain situations. However they work very well in wide string and wide character literals: printf("%ls", L"\u00B1"); or printf("%lc", L'\00B1'); will print U+00B1 in the correct encoding for your locale.

Upvotes: 3

Related Questions