Reputation: 162
I'm using the following code to format numbers using the proper locale. When using French, numbers have "non-breaking space" between groups of digits. The string I'm getting seems to be invalid.
std::stringstream ss;
ss.imbue(std::locale("fr_FR.UTF-8"));
ss << 1234;
auto result = ss.str();
here, result
is: {49, -62, 50, 51, 52}
. The non-breaking space is represented with -62. It seems to me that it's invalid UTF-8, right?
I expect result
to be: {49, -62, -96, 50, 51, 52}
(in this case, this seems valid, with the non-breaking space represented with two chars: -62, -96).
Am I missing something? Thanks for your help.
Upvotes: 4
Views: 467
Reputation: 55625
The problem is that std::locale
doesn't support multi-byte digit separators because std::numpunct::thousands_sep
only returns a single code unit (char
in this case). As a result, in your case, the digit separator NO-BREAK SPACE 0xC2 (-62) 0xA0 (-96)
gets truncated and you only see the first code unit 0xC2 (-62)
which is an invalid partial UTF-8.
Upvotes: 7