Reputation: 621
std::string path("path.txt");
std::fstream f(path);
f.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::string lcpath;
f >> lcpath;
Reading a utf-8 text from path.txt
on windows fails with MSVC compiler on windows in the sense lcpath
does not understand the path as utf-8.
The below code works correctly on linux when compiled with g++.
std::string path("path.txt");
std::fstream ff;
ff.open(path.c_str());
std::string lcpath;
ff>>lcpath;
Is fstream
on windows(MSVC) by default assume ascii only?
In the first snippet if I change string
with wstring
and fstream
with wfstream
, lcpath
gets correct value on windows as well.
EDIT: If I convert the read lcpath
using MultiByteToWideChar()
, I get the correct representation. But why can't I directly read a UTF-8 string into std::string
on windows?
Upvotes: 0
Views: 142
Reputation: 264631
Imbuing an opened file can be problamatic:
http://www.cplusplus.com/reference/fstream/filebuf/imbue/
If loc is not the same locale as currently used by the file stream buffer, either the internal position pointer points to the beginning of the file, or its encoding is not state-dependent. Otherwise, it causes undefined behavior.
The problem here is that when a file is opened and the file has a BOM marker in it this will usually be read from the file by the currently installed local. Thus the position pointer
is no longer at the beginning of the file and we have undefined behavior.
To make sure your local is set correctly you must do it before opening the file.
std::fstream f;
f.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::string path("path.txt");
f.open(path);
std::string lcpath;
f >> lcpath;
Upvotes: 1