Reputation: 29996
There is a plenty of questions on SO regarding this, but most of them do not mention writing wstring back to file. So for example I found this for reading:
// open as a byte stream
std::wifstream fin("/testutf16.txt", std::ios::binary);
// apply BOM-sensitive UTF-16 facet
fin.imbue(std::locale(fin.getloc(),
new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
// read
std::wstring ws;
for(wchar_t c; fin.get(c); )
{
std::cout << std::showbase << std::hex << c << '\n';
ws.push_back(c);
}
I tried similar stuff for writing:
std::wofstream wofs("/utf16dump.txt", std::ios::binary);
wofs.imbue(std::locale(wofs.getloc(),
new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
wofs << ws;
but it produces garbage, (or Notpad++ and vim cant interpret it). As mentioned in the title Im on Win, native C++, VS 2010.
Input file:
t€stUTF16✡
test
This is what is the result:
t€stUTF16✡
test
convert to hex:
0000000: 7400 ac20 7300 7400 5500 5400 4600 3100 t.. s.t.U.T.F.1.
0000010: 3600 2127 0d00 0a00 7400 6500 7300 7400 6.!'....t.e.s.t.
0000020: 0a
...
vim normal output:
t^@¬ s^@t^@U^@T^@F^@1^@6^@!'^M^@ ^@t^@e^@s^@t^@
EDIT: I ended up using UTF8. Andrei Alexandrescu says it is the best encoding so no big loss. :)
Upvotes: 1
Views: 7023
Reputation: 679
It is easy if you use the C++11
standard (because there are a lot of additional includes like "utf8"
which solves this problems forever).
But if you want to use multi-platform code with older standards, you can use this method to write with streams:
stxutif.h
to your project from sources above Open the file in ANSI mode and add the BOM to the start of a file, like this:
std::ofstream fs;
fs.open(filepath, std::ios::out|std::ios::binary);
unsigned char smarker[3];
smarker[0] = 0xEF;
smarker[1] = 0xBB;
smarker[2] = 0xBF;
fs << smarker;
fs.close();
Then open the file as UTF
and write your content there:
std::wofstream fs;
fs.open(filepath, std::ios::out|std::ios::app);
std::locale utf8_locale(std::locale(), new utf8cvt<false>);
fs.imbue(utf8_locale);
fs << .. // Write anything you want...
Upvotes: 2
Reputation: 283614
For output, you want to use generate_header
instead of consume_header
.
See http://en.cppreference.com/w/cpp/locale/codecvt_mode
Upvotes: 1
Reputation: 283614
Your similar code -- isn't. You removed the std::ios::binary
style, despite the fact that the documentation says
The byte stream should be written to a binary file; it can be corrupted if written to a text file.
NL->CRLF conversion in ASCII mode isn't going to do pretty things to UTF-16 files, since it will insert one byte 0x0D instead of two bytes 0x00 0x0D.
Upvotes: 3