Reputation: 105
I'm trying to read a file which has UTF-16LE coding with BOM. I tried this code
#include <iostream>
#include <fstream>
#include <locale>
#include <codecvt>
int main() {
std::wifstream fin("/home/asutp/test");
fin.imbue(std::locale(fin.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
if (!fin) {
std::cout << "!fin" << std::endl;
return 1;
}
if (fin.eof()) {
std::cout << "fin.eof()" << std::endl;
return 1;
}
std::wstring wstr;
getline(fin, wstr);
std::wcout << wstr << std::endl;
if (wstr.find(L"Test") != std::string::npos) {
std::cout << "Found" << std::endl;
} else {
std::cout << "Not found" << std::endl;
}
return 0;
}
The file can contain Latin and Cyrillic. I created the file with a string "Test тест". And this code returns me
/home/asutp/CLionProjects/untitled/cmake-build-debug/untitled
Not found
Process finished with exit code 0
I'm on Linux Mint 18.3 x64, Clion 2018.1
Tried
Upvotes: 5
Views: 10577
Reputation: 31599
Ideally you should save files in UTF8, because Window has much better UTF8 support (aside from displaying Unicode in console window), while POSIX has limited UTF16 support. Even Microsoft products favor UTF8 for saving files in Windows.
As an alternative, you can read the UTF16 file in to a buffer and convert that to UTF8 (std::codecvt_utf8_utf16)
std::ifstream fin("utf16.txt", std::ios::binary);
fin.seekg(0, std::ios::end);
size_t size = (size_t)fin.tellg();
//skip BOM
fin.seekg(2, std::ios::beg);
size -= 2;
std::u16string u16((size / 2) + 1, '\0');
fin.read((char*)&u16[0], size);
std::string utf8 = std::wstring_convert<
std::codecvt_utf8_utf16<char16_t>, char16_t>{}.to_bytes(u16);
std::ifstream fin("utf16.txt", std::ios::binary);
//skip BOM
fin.seekg(2);
//read as raw bytes
std::stringstream ss;
ss << fin.rdbuf();
std::string bytes = ss.str();
//make sure len is divisible by 2
int len = bytes.size();
if(len % 2) len--;
std::wstring sw;
for(size_t i = 0; i < len;)
{
//little-endian
int lo = bytes[i++] & 0xFF;
int hi = bytes[i++] & 0xFF;
sw.push_back(hi << 8 | lo);
}
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
std::string utf8 = convert.to_bytes(sw);
Upvotes: 8
Reputation: 1854
Replace by this - std::wstring::npos
(not std::string::npos
) -, and your code must work :
...
//std::wcout << wstr << std::endl;
if (wstr.find(L"Test") == std::wstring::npos) {
std::cout << "Not Found" << std::endl;
} else {
std::cout << "found" << std::endl;
}
Upvotes: 0