Reputation: 7157
C++20 added char8_t
, which is (I believe) designed to help support UTF-8 better.
String constants of the form u8"abc"
are required by the standard to be valid UTF-8 in a char8_t[]
array. These constants can also be turned into std::u8string
s.
However, I can find nothing in the C++ standard which suggests that a std::u8string
either must, or even should, contain a UTF-8 string. Is there in practice any difference between a std::string
and std::u8string
in terms of UTF-8 support?
Upvotes: 2
Views: 391
Reputation: 70693
No, c++ does not require you to store valid utf8 in u8strings. From the compiler's perspective, std::u8string
has exactly the same semantics as std::string
.
But "in practice" you can expect functions taking a u8string
argument to expect that string to be valid utf8. Even if they accept invalid utf8, they will definitely never expect your string to be latin1 encoded. The same definitely can't be said for std::string
.
Upvotes: 2