Reputation: 10332
It is known that in C, a string is represented by an array of char
s.
And in most 32-bit processors, a char
takes one byte or eight bits. And a string consists of an array of one byte
s.
Because extended characters like Chinese and Japanese takes up more bits than 8 bits, I am getting a little confused about the stuff around this.
For example, I tested that I can define an array of Chinese characters the same way an array of English letters is defined, using syntax likechar array[100]
. So my question is:
Is there a mechanism that attempts to bridge the gap between general 8-bits characters and greater-than-8-bits characters so that they are treated like the same, just like what I have mentioned above.
Upvotes: 4
Views: 1836
Reputation:
I'd suggest using the UTF8 string encoding, as it makes possible to use normal (byte <= 127) characters as usually, and in addition, you'll be able to use the two-, three-, or four-byte characters by detecting an Unicode control character (byte >= 128). You also can use libiconv for some related problems. http://www.gnu.org/software/libiconv/
Upvotes: 0