user22683446
user22683446

Reputation:

C++: Inputting & Outputting UTF-8 on Windows?

I'm not familiar with Windows at all

I'm struggling to write a function which reads from a file containing Chinese characters & does some regex.

Roughly:

std::ifstream t(input_file);
std::stringstream buffer;
buffer << t.rdbuf();
std::string page_contents = buffer.str();

...

page_contents = std::regex_replace(page_contents, std::regex("([a-z]{3})你好"), "$1再见");

This works fine on Debian, but Windows can't seem to handle the Chinese characters in the file at all. I'm cross-compiling from Debian using MXE (mingw)

I did some further testing:

#ifdef _WIN32
SetConsoleOutputCP(CP_UTF8);
setvbuf(stdout, nullptr, _IOFBF, 1000);
#endif

std::cout << "你好" << std::endl;

And found that where Debian outputted "你好" (E4 BD A0 E5 A5 BD), Windows outputted "你好" (C3 A4 C2 BD C2 A0 C3 A5 C2 A5 C2 BD)

I'm completely at a loss for how to handle this. Thanks a million in advance to anyone who can point me in the right direction

Upvotes: 0

Views: 63

Answers (0)

Related Questions