user4582812
user4582812

Reputation: 621

How to get the size of a UTF-16LE string in bytes?

Say I have the following:

wchar_t *str = L"Hello World!";

I want to get the size of L"Hello World!" in bytes (not how many characters it consists of).

I have read that wcslen() counts every 2 bytes as 1 character, so if a character is 4 bytes, it will treat it as 2 characters!

This is great for me because now I can just do:

int size_of_str_in_bytes = wcslen(str) * 2;

But is it guaranteed that wcslen() will always behave this way?

Upvotes: 0

Views: 399

Answers (1)

Lightness Races in Orbit
Lightness Races in Orbit

Reputation: 385385

Well, wcslen always gives you the number of wchar_ts. It's the analogue of strlen.

(Note that, just like strlen, the terminating "null" character is not included!)

That's not quite the same as "counting every 2 bytes as 1 character", though for systems on which wchar_t is 2 bytes wide, the effect would be the same.

I would use sizeof(wchar_t) instead of 2, though. Y'know, for portability and all that.

For example, Coliru's platform has sizeof(wchar_t) == 4:

#include <cwchar>
#include <cassert>

int main()
{
    const wchar_t* wstr = L"Hello world";
    const size_t size_of_wide_cstr_in_bytes = wcslen(wstr) * sizeof(wchar_t);

    assert(sizeof(wchar_t) == 4);             // on this particular system
    assert(size_of_wide_cstr_in_bytes == 44); // on this particular system
}

(live demo)

Upvotes: 1

Related Questions