Avoid / set character set conversion /encoding for std::cout / std::cerr

Question

General question

Is there a possibility to avoid character set conversion when writing to std::cout / std::cerr? I do something like

std::cout << "Ȋ'ɱ ȁ ȖȚƑ-8 Șțȓȉɳɠ (in UTF-8 encoding)" << std::endl;

And I want the output to be written to the console maintaining the UTF-8 encoding (my console uses UTF-8 encoding, but my C++ Standard Library, GNUs libstdc++, doesn't think so for some reason).

If there's no possibility to forbid character encoding conversion: Can I set std::cout to use UTF-8, so it hopefully figures out itself that no conversion is needed?

Background

I used the Windows API function SetConsoleOutputCP(CP_UTF8); to set my console's encoding to UTF-8. The problem seems to be that UTF-8 does not match the code page typicallly used for my system's locale and libstdc++ therefore sets up std::cout with the default ANSI code page instead of correctly recognizing the switch.

Edit: Turns out I misinterpreted the issue and the solution is actually a lot simpler (or not...).

The "Ȋ'ɱ ȁ ȖȚƑ-8 Șțȓȉɳɠ (in UTF-8 encoding)" was just meant as a placeholder (and I shouldn't have used it as it has hidden the actual issue).

In my real code the "UTF-8 string" is a Glib::ustring, and those are by definition UTF-8 encoded. However I did not realize that the output operator << was defined in glibmm in a way that forces character set conversion.
It uses g_locale_from_utf8() internally which in turn uses g_get_charset() to determine the target encoding.

Unfortunately the documentation for g_get_charset() states

On Windows the character set returned by this function is the so-called system default ANSI code-page. That is the character set used by the "narrow" versions of C library and Win32 functions that handle file names. It might be different from the character set used by the C library's current locale.

which simply means that glib will neither care for the C locale I set nor will it attempt to determine the encoding my console actually uses and basically makes it impossible to use many glib functions to create UTF-8 output. (As a matter of fact this also means that this issue has the exact same cause as the issue that triggered my other question: Force UTF-8 encoding in glib's "g_print()").

I'm currently considering this a bug in glib (or a serious limitation at best) and will probably open a report in the issue tracker for it.

Avoid / set character set conversion /encoding for std::cout / std::cerr

Answers (1)

Related Questions