Reputation: 1771
C++ Literals
Environment:
My understanding for numerical literals prefixes is that they are useful to determine the numerical value type (not sure).However, I have a lot of confusion on character and string literals prefixes and suffixes. I read a lot and spent days trying to understand the situation, but I got more questions and few answers. so I thought stack overflow could be of a lot of help.
Qs:
1- What are the correct use for the string prefixes u8 u U L?
I have the following code as example:
#include <iostream>
#include <string>
using namespace std;
int main()
{
cout << "\n\n Hello World! (plain) \n";
cout << u8"\n Hello World! (u8) \n";
cout << u"\n Hello World! (u) \n";
cout << U"\n Hello World! (U) \n";
cout << L"\n Hello World! (plain) \n\n";
cout << "\n\n\n";
}
The output is like this:
Hello World! (plain)
Hello World! (u8)
0x47f0580x47f0840x47f0d8
Q2: Why U u ans L has such output? I expected it is just to determine type not do encoding mapping (if it is).
Q3 Is there a simple and to the point references about encodings like UTF-8. I am confused about them, in addition I doubt that console applications is capable to deal with them. I see it is crucial to understand them.
Q4: Also I will appreciate a step by step reference that explain custom type literals.
Upvotes: 1
Views: 2470
Reputation: 21576
First see: http://en.cppreference.com/w/cpp/language/string_literal
std::cout
's class operator <<
is properly overloaded to print const char*
. That is why the first two strings are printed.
cout << "\n\n Hello World! (plain) \n"; cout << u8"\n Hello World! (u8) \n";
As expected, prints1:
Hello World! (plain) Hello World! (u8)
Meanwhile std::cout
's class has no special <<
overload for const char16_t*
, const char32_t*
and const wchar_t*
, hence it will match <<
's overload for printing pointers, that is why:
cout << u"\n Hello World! (u) \n"; cout << U"\n Hello World! (U) \n"; cout << L"\n Hello World! (plain) \n\n";
Prints:
0x47f0580x47f0840x47f0d8
As you can see, there are actually 3 pointer values printed there: 0x47f058
, 0x47f084
and 0x47f0d8
However, for the last one, you can get it to print properly using std::wcout
std::wcout << L"\n Hello World! (plain) \n\n";
prints
Hello World! (plain)
1: The u8
literal printed as expected because of the direct ASCII mapping of the first few codepoints of UTF-8.
Upvotes: 3
Reputation: 2355
1) Narrow multibyte string literal. The type of an unprefixed string literal is const char[]
.
2) Wide string literal. The type of a L"..."
string literal is const wchar_t[]
.
3) UTF-8 encoded string literal. The type of a u8"..."
string literal is const char[]
.
4) UTF-16 encoded string literal. The type of a u"..."
string literal is const char16_t[]
.
5) UTF-32 encoded string literal. The type of a U"..."
string literal is const char32_t[]
.
6) Raw string literal. Used to avoid escaping of any character, anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.
std::cout
expects single byte characters, otherwise it can output a value such as 0x47f0580x47f0840x47f0d8
. If your trying to output literals that consist of multi-byte characters (char16_t, char32_t, or wchar_t) then you need to use std::wcout
to output them to the console, or convert them to a single byte character type. Raw string literals are very handy for formatting output. An example of Raw string literals is R"~(This is the text that will be output just as I typed it into the code editor!)~"
and will be a single byte character string. If it's prefixed with any of the multi-byte qualifiers the raw string literal will be multi-byte. Here is a very comprehensive reference on string literals.
Upvotes: 2