Reputation: 4109
I have some text files which are encoded using UTF-8. Is there a way to read them using c++ stream classes (wifstream for example)?
I have seen some external references like boost and some codeproject code snippets. But, I dont want to use that just for this purpose.
On linux it somehow works by calling imbue(std::locale("en_US")) but not on windows. I think the problem is that window assumes wifstream to be a UTF-16 encoded stream. Can't I specify the unicode encoding with wifstream class somehow so that it uses UTF-8 not UTF-16?
Upvotes: 3
Views: 2626
Reputation: 88225
In addition to just reading the bytes from the file normally, and treating them as UTF-8 (e.g., by not passing them to anything that expects locale encoded strings, only to things that expect UTF-8), Windows has another way to read in UTF-8.
You can set a 'UTF-8' mode on file descriptors, and then use wide character input and output on that file descriptor and Microsoft's C runtime will handle transforming the wide characters to and from UTF-8 encoded byte streams:
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
int main(void) {
_setmode(_fileno(stdout), _O_U8TEXT);
wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n");
}
If you run the above program with output redirected to a file you will get a UTF-8 encoded file.
Setting one of these Unicode modes on a file descriptor has the additional effect on consoles that wide character output will actually work on the console. I'm not sure why exactly Microsoft chose "broken" as the default, but at least there's a way to enable a "not broken" mode.
Upvotes: 2
Reputation: 96177
You can read utf8 files on windows perfectly normally - the only problem is when you want to do something with them.
Almost all Windows API calls use UTF16 or MBCS, you will need to convert UTF8-MBCS whenever you pass it to a Windows API - see Converting C-Strings from Local Encoding to UTF8
Upvotes: 0