Boloto
Boloto

Reputation: 45

Can't read cyrillic from .txt file

I want to read this cyrillic text from a .txt file: аааааааааааа

std::wstring str;
std::wifstream in(path);
std::getline(in, str);
in.close();

But the content of str is: аааааааааааа (file encoding - UTF-8) (Watched string content in debug, not in console)

I tried to change file encoding to UTF-16 (LE and BE), have: ÿþ000000000000 and þÿ000000000000

Also, I found this solution, but as you can see, it didn't help.

Upvotes: 0

Views: 1379

Answers (1)

Barmak Shemirani
Barmak Shemirani

Reputation: 31599

In Windows you have to open the file in binary, then apply the UTF16 facet, otherwise system will assume default code page. See example below.

Note that it is common to use UTF8 for storing data, even in Windows applications. Your Windows program expects UTF16 for the APIs, so you can read/write the file in UTF8, then convert back and forth to UTF16

#define _SILENCE_CXX17_CODECVT_HEADER_DEPRECATION_WARNING
//silence codecvt warnings

std::wstring str;
std::wifstream in(path, std::ios::binary);
in.imbue(std::locale(in.getloc(), 
    new std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>));
std::getline(in, str);
in.close();

You can also use pubsetbuf to avoid codecvt warnings:

std::wifstream in(path, std::ios::binary);
wchar_t wbuf[128] = { 0 };
in.rdbuf()->pubsetbuf(wbuf, 128);

//BOM check
wchar_t bom{};
in.read(&bom, 1);
if(bom == 0xfeff)
    std::cout << "UTF16-LE\n";

//read file
std::wstring str;
std::getline(in, str);
in.close();

Upvotes: 1

Related Questions