eraxillan
eraxillan

Reputation: 1620

Correct and crossplatform way to use UTF-8 in C++ streams

As i understood from this answer to the similar question, there is a still unfixed bug in the Visual C++ STL implementation. So, there is no possibility to just write std::cout << raw_utf8_string << std::endl and enjoy the nice UTF-8 chars under Windows ;(

NOTE: My test program lives here.

But maybe there is an quite simple-to-understand workaround to handle this? My thoghts: make a wrapper class like cout_ex, which will use Windows API WriteConsoleA for console output.
In its constructor do this:

#ifdef _WIN32
if (IsValidCodePage (CP_UTF8))
{
    if (!SetConsoleCP (CP_UTF8))
        std::cout << "Could not set console input code page to UTF-8" << std::endl;
    if (!SetConsoleOutputCP (CP_UTF8))
        std::cout << "Could not set console output code page to UTF-8" << std::endl;
}
else
    std::cout << "UTF-8 code page is not supported in your system" <<   std::endl;
#endif

And in output method do this:

char const raw_utf8_text[] = "Blåbærsyltetøy! кошка!";

DWORD raw_written = 0;
WriteConsoleA (GetStdHandle (STD_OUTPUT_HANDLE), raw_utf8_text, std::strlen (raw_utf8_text), &raw_written, NULL);

And don't forget to use the undocumented Visual C++ pragma at the very beginning of src:

#pragma execution_character_set("utf-8")

But maybe one have a more clear solution :) Even with using some external libs like Poco/Boost/etc.

I try to read those articles 1, 2, but i found this way is too complicated. P.S. Overrided stream class also should set console font to the Unicode one.
P.P.S. Software versions: Windows 8 x64 + Visual C++ 2013 Express.

Upvotes: 3

Views: 1678

Answers (1)

Basilevs
Basilevs

Reputation: 23929

You should imbue a proper codecvt_facet within your output stream.

std::locale loc;
string encoding=getOutputEncoding(); // 
loc=std::locale(loc, createCodecvt(encoding));
cout.imbue(loc);
cout.rdbuf().imbue(loc);

Complete code here

This facet should convert internal encoding to external one. Due to some bugs in STL implementation this might be impossible to do if internal storage format is in one-byte or multibyte encoding. There is a workaround for that - to use filestreambuf instead of default output buffer.

You might have to implement your own codecvt_facet or use my iconv wrapper.

Overall I still recommend to use wide characters for internal processing. This way you might even avoid any extra conversions (besides system default ones).

Upvotes: 1

Related Questions