Frozax
Frozax

Reputation: 162

Invalid UTF-8 data when using std::locale to format numbers in French

I'm using the following code to format numbers using the proper locale. When using French, numbers have "non-breaking space" between groups of digits. The string I'm getting seems to be invalid.

    std::stringstream ss;
    ss.imbue(std::locale("fr_FR.UTF-8"));
    ss << 1234;
    auto result = ss.str();

here, result is: {49, -62, 50, 51, 52}. The non-breaking space is represented with -62. It seems to me that it's invalid UTF-8, right?

I expect result to be: {49, -62, -96, 50, 51, 52} (in this case, this seems valid, with the non-breaking space represented with two chars: -62, -96).

Am I missing something? Thanks for your help.

Upvotes: 4

Views: 467

Answers (1)

vitaut
vitaut

Reputation: 55625

The problem is that std::locale doesn't support multi-byte digit separators because std::numpunct::thousands_sep only returns a single code unit (char in this case). As a result, in your case, the digit separator NO-BREAK SPACE 0xC2 (-62) 0xA0 (-96) gets truncated and you only see the first code unit 0xC2 (-62) which is an invalid partial UTF-8.

Upvotes: 7

Related Questions