ZijingWu
ZijingWu

Reputation: 3490

How posix system support unicode?

I have seen a lot of API posix system, for example Linux & Mac & android, which accept const char* as the argument for file path.

One of example is dlopen, as the document show, the first argument is const char*, so does it support Unicode file path, for example path with Chinese?

Upvotes: 0

Views: 2273

Answers (2)

Brian Bi
Brian Bi

Reputation: 119382

POSIX is not required to support Unicode filenames. (See: https://stackoverflow.com/a/2306003/481267) However, provided that they are encoded in UTF-8, there are no technical obstacles to supporting Unicode. Many modern file systems allow any character in a file name except \0 and /.

The POSIX API deals with null-terminated byte sequences, and when a string is encoded in UTF-8, no code point's representation contains a zero byte. Furthermore, all characters outside the ASCII range (0x00-0x7f) are encoded entirely using bytes with the high order bit set (0x80-0xff) so there is no chance that the system will be confused into thinking that there is a directory separator in the middle of some Unicode character.

Upvotes: 3

user3159253
user3159253

Reputation: 17455

It's assumed that in modern Linux/Unix systems unicode filenames are expressed in UTF-8 locale which is byte-oriented (although some of underlying filesystem stores internally filenames in UTF-16).

Upvotes: 2

Related Questions