Reputation: 8315
I've read that Windows CE uses the "UTF-16 version of UNICODE" (i'm a newbie with encodings).
What happens when a string contains a character that requires more that 2 bytes, like chinese characters ? Does it take 3 ? If i have a string containing chinese characters, accessing the N-th couple of bytes will not necessaily access the N-th visible symbol ?
Also what about performance ? If i understand well, encodings that have a variable number of bytes per visible symbol require the string to be scanned from the beginning to access the N-th visible symbol right ? If yes is it also true for UTF-16 ?
Thank you.
Upvotes: 0
Views: 284
Reputation: 151586
What happens when a string contains a character that requires more that 2 bytes, like Chinese characters? Does it take 3?
No, four.
In UTF-16, code points greater or equal to 216 are encoded using two 16-bit code units.
If I understand well, encodings that have a variable number of bytes per visible symbol require the string to be scanned from the beginning to access the N-th visible symbol right?
Yes. See for example Why use multibyte string functions in PHP?.
Upvotes: 1