Ray
Ray

Reputation: 153

UTF-16 string terminator

What is the string terminator sequence for a UTF-16 string?

EDIT:

Let me rephrase the question in an attempt to clarify. How's does the call to wcslen() work?

Upvotes: 15

Views: 12392

Answers (3)

Michael Petrotta
Michael Petrotta

Reputation: 60902

Unicode does not define string terminators. Your environment or language does. For instance, C strings use 0x0 as a string terminator, as well as in .NET strings where a separate value in the String class is used to store the length of the string.

To answer your second question, wcslen looks for a terminating L'\0' character. Which as I read it, is any length of 0x00 bytes, depending on the compiler, but will likely be the two-byte sequence 0x00 0x00 if you're using UTF-16 (encoding U+0000, 'NUL')

Upvotes: 17

pmg
pmg

Reputation: 108968

7.24.4.6.1 The wcslen function (from the Standard)

...

   [#3]   The  wcslen  function  returns  the  number  of  wide
   characters that precede the terminating null wide character.

And the null wide character is L'\0'

Upvotes: 5

Darin Dimitrov
Darin Dimitrov

Reputation: 1038710

There isn't any. String terminators are not part of an encoding.

For example if you had the string ab it would be encoded in UTF-16 with the following sequence of bytes: 61 00 62 00. And if you had 大家 you would get 27-59-B6-5B. So as you can see no predetermined terminator sequence.

Upvotes: 4

Related Questions