Reputation: 21
I've seen a few other posts on this issue but was unable to find any details on how to determine programatically if a codepoint uses more than one 2-byte (on Windows) wchar_t.
An example:
const wchar_t* s2 = L"\U0002008A"; // The "Han" character
std::wstring in(s2); // length() == 2
I'd like to know how to determine when a character will have a length() > 1.
Upvotes: 2
Views: 962
Reputation: 88155
Any codepoint above U+FFFF uses surrogates in its UTF-16 encoding. Surrogate values are in the range D800-DFFF.
Upvotes: 5