Virus721
Virus721

Reputation: 8315

Windows CE / UTF-16 / Chinese

I've read that Windows CE uses the "UTF-16 version of UNICODE" (i'm a newbie with encodings).

What happens when a string contains a character that requires more that 2 bytes, like chinese characters ? Does it take 3 ? If i have a string containing chinese characters, accessing the N-th couple of bytes will not necessaily access the N-th visible symbol ?

Also what about performance ? If i understand well, encodings that have a variable number of bytes per visible symbol require the string to be scanned from the beginning to access the N-th visible symbol right ? If yes is it also true for UTF-16 ?

Thank you.

Upvotes: 0

Views: 284

Answers (1)

CodeCaster
CodeCaster

Reputation: 151586

What happens when a string contains a character that requires more that 2 bytes, like Chinese characters? Does it take 3?

No, four.

Wikipedia: UTF-16:

In UTF-16, code points greater or equal to 216 are encoded using two 16-bit code units.


If I understand well, encodings that have a variable number of bytes per visible symbol require the string to be scanned from the beginning to access the N-th visible symbol right?

Yes. See for example Why use multibyte string functions in PHP?.

Upvotes: 1

Related Questions