C++ Literals and Unicode

Question

C++ Literals

Environment:

OS: Windows 10 Pro;
Compiler: GCC latest.
IDE: Code::Blocks latest.
working on: Console applications.

My understanding for numerical literals prefixes is that they are useful to determine the numerical value type (not sure).However, I have a lot of confusion on character and string literals prefixes and suffixes. I read a lot and spent days trying to understand the situation, but I got more questions and few answers. so I thought stack overflow could be of a lot of help.

Qs:

1- What are the correct use for the string prefixes u8 u U L?

I have the following code as example:

#include 
#include 
using namespace std;

int main()
{
    cout << "

 Hello World! (plain) 
";
    cout << u8"
 Hello World! (u8) 
";
    cout << u"
 Hello World! (u) 
";
    cout << U"
 Hello World! (U) 
";
    cout << L"
 Hello World! (plain) 

";

    cout << "


";
}

The output is like this:

Hello World! (plain)

Hello World! (u8)

0x47f0580x47f0840x47f0d8

Q2: Why U u ans L has such output? I expected it is just to determine type not do encoding mapping (if it is).

Q3 Is there a simple and to the point references about encodings like UTF-8. I am confused about them, in addition I doubt that console applications is capable to deal with them. I see it is crucial to understand them.

Q4: Also I will appreciate a step by step reference that explain custom type literals.

WhiZTiM · Accepted Answer

First see: http://en.cppreference.com/w/cpp/language/string_literal

std::cout's class operator << is properly overloaded to print const char*. That is why the first two strings are printed.

cout << "

 Hello World! (plain) 
";
cout << u8"
 Hello World! (u8) 
";

As expected, prints¹:

Hello World! (plain)

Hello World! (u8)

Meanwhile std::cout's class has no special << overload for const char16_t*, const char32_t* and const wchar_t*, hence it will match <<'s overload for printing pointers, that is why:

cout << u"
 Hello World! (u) 
";
cout << U"
 Hello World! (U) 
";
cout << L"
 Hello World! (plain) 

";

Prints:

0x47f0580x47f0840x47f0d8

As you can see, there are actually 3 pointer values printed there: 0x47f058, 0x47f084 and 0x47f0d8

However, for the last one, you can get it to print properly using std::wcout

std::wcout << L"
 Hello World! (plain) 

";

prints

 Hello World! (plain)

^{1: The u8 literal printed as expected because of the direct ASCII mapping of the first few codepoints of UTF-8.}

C++ Literals and Unicode

Answers (2)

Related Questions