HelloWorld
HelloWorld

Reputation: 1863

What is the encoding behind L"" in Windows?

I'm trying to find any information about the encoding behind L"" strings?

https://learn.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-160

I know wchar_t is undefined because it can be any multi-byte encoding. But what happens if I use an L"" string? Even the docs just leave out the information.

auto s2 =  L"hello"; // const wchar_t* <-- it's undefined but why?
auto s3 =  u"hello"; // const char16_t*, encoded as UTF-16
auto s4 =  U"hello"; // const char32_t*, encoded as UTF-32

Upvotes: 0

Views: 83

Answers (1)

Mark Ransom
Mark Ransom

Reputation: 308168

wchar_t is a standard type, but its exact implementation is left to individual compilers. Microsoft decided back when Unicode all fit into 16-bit quantities that wchar_t would be 2 bytes in size, and Windows would use UCS-2. Later, when Unicode exceeded 16-bit quantities, Windows was updated to use UTF-16, and since Windows operated on little-endian processors, that made it UTF-16LE. wchar_t remained 2 bytes in size, which can handle UTF-16 values, using surrogate pairs for Unicode codepoints above U+FFFF.

Upvotes: 3

Related Questions