user3819404
user3819404

Reputation: 621

std::fstream different behavior on msvc and g++ with utf-8

std::string path("path.txt");
std::fstream f(path);
f.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::string lcpath;
f >> lcpath;

Reading a utf-8 text from path.txt on windows fails with MSVC compiler on windows in the sense lcpath does not understand the path as utf-8.

The below code works correctly on linux when compiled with g++.

    std::string path("path.txt");
    std::fstream ff;
    ff.open(path.c_str());
    std::string lcpath;
    ff>>lcpath;

Is fstream on windows(MSVC) by default assume ascii only?

In the first snippet if I change string with wstring and fstream with wfstream, lcpath gets correct value on windows as well.

EDIT: If I convert the read lcpath using MultiByteToWideChar(), I get the correct representation. But why can't I directly read a UTF-8 string into std::string on windows?

Upvotes: 0

Views: 142

Answers (1)

Loki Astari
Loki Astari

Reputation: 264631

Imbuing an opened file can be problamatic:

http://www.cplusplus.com/reference/fstream/filebuf/imbue/

If loc is not the same locale as currently used by the file stream buffer, either the internal position pointer points to the beginning of the file, or its encoding is not state-dependent. Otherwise, it causes undefined behavior.

The problem here is that when a file is opened and the file has a BOM marker in it this will usually be read from the file by the currently installed local. Thus the position pointer is no longer at the beginning of the file and we have undefined behavior.

To make sure your local is set correctly you must do it before opening the file.

std::fstream f;
f.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));

std::string path("path.txt");
f.open(path);

std::string lcpath;
f >> lcpath;

Upvotes: 1

Related Questions