user2295995
user2295995

Reputation: 21

how to tell if a wchar_t has a surrogate (UTF-16)?

I've seen a few other posts on this issue but was unable to find any details on how to determine programatically if a codepoint uses more than one 2-byte (on Windows) wchar_t.

An example:

const wchar_t* s2 = L"\U0002008A"; // The "Han" character
std::wstring in(s2);               // length() == 2

I'd like to know how to determine when a character will have a length() > 1.

Upvotes: 2

Views: 962

Answers (1)

bames53
bames53

Reputation: 88155

Any codepoint above U+FFFF uses surrogates in its UTF-16 encoding. Surrogate values are in the range D800-DFFF.

Upvotes: 5

Related Questions