Reputation: 45
I want to read this cyrillic text from a .txt file: аааааааааааа
std::wstring str;
std::wifstream in(path);
std::getline(in, str);
in.close();
But the content of str
is: аааааааааааа
(file encoding - UTF-8) (Watched string content in debug, not in console)
I tried to change file encoding to UTF-16 (LE and BE), have: ÿþ000000000000
and þÿ000000000000
Also, I found this solution, but as you can see, it didn't help.
Upvotes: 0
Views: 1379
Reputation: 31599
In Windows you have to open the file in binary, then apply the UTF16 facet, otherwise system will assume default code page. See example below.
Note that it is common to use UTF8 for storing data, even in Windows applications. Your Windows program expects UTF16 for the APIs, so you can read/write the file in UTF8, then convert back and forth to UTF16
#define _SILENCE_CXX17_CODECVT_HEADER_DEPRECATION_WARNING
//silence codecvt warnings
std::wstring str;
std::wifstream in(path, std::ios::binary);
in.imbue(std::locale(in.getloc(),
new std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>));
std::getline(in, str);
in.close();
You can also use pubsetbuf
to avoid codecvt warnings:
std::wifstream in(path, std::ios::binary);
wchar_t wbuf[128] = { 0 };
in.rdbuf()->pubsetbuf(wbuf, 128);
//BOM check
wchar_t bom{};
in.read(&bom, 1);
if(bom == 0xfeff)
std::cout << "UTF16-LE\n";
//read file
std::wstring str;
std::getline(in, str);
in.close();
Upvotes: 1