Reputation: 1863
I'm trying to find any information about the encoding behind L""
strings?
https://learn.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-160
I know wchar_t
is undefined because it can be any multi-byte encoding. But what happens if I use an L""
string? Even the docs just leave out the information.
auto s2 = L"hello"; // const wchar_t* <-- it's undefined but why?
auto s3 = u"hello"; // const char16_t*, encoded as UTF-16
auto s4 = U"hello"; // const char32_t*, encoded as UTF-32
Upvotes: 0
Views: 83
Reputation: 308168
wchar_t
is a standard type, but its exact implementation is left to individual compilers. Microsoft decided back when Unicode all fit into 16-bit quantities that wchar_t
would be 2 bytes in size, and Windows would use UCS-2. Later, when Unicode exceeded 16-bit quantities, Windows was updated to use UTF-16, and since Windows operated on little-endian processors, that made it UTF-16LE. wchar_t
remained 2 bytes in size, which can handle UTF-16 values, using surrogate pairs for Unicode codepoints above U+FFFF.
Upvotes: 3