Michael Surette
Michael Surette

Reputation: 711

reading a UTF-16 file in Linux

I have the following program which reads a file into a string buffer.

#include <fstream>
#include <iostream>
#include <string>

using namespace std;

constexpr int BUFSIZE = 1024;

int main(int argc, char *argv[])
{
    std::ifstream ifs(argv[1], std::ifstream::binary);
    if(!ifs)
        return 1;

    string buffer(BUFSIZE, L'\0');
    ifs.read(&buffer[0], BUFSIZE);

    cerr << ifs.gcount() << endl;

    return 0;
}

It prints out the expected 1024.

The following program which is supposed to read into a wstring buffer doesn't work though.

#include <fstream>
#include <iostream>
#include <string>

using namespace std;

constexpr int BUFSIZE = 1024;

int main(int argc, char *argv[])
{
    std::wifstream ifs(argv[1], std::ifstream::binary);
    if(!ifs)
        return 1;

    wstring buffer(BUFSIZE, L'\0');
    ifs.read(&buffer[0], BUFSIZE);

    cerr << ifs.gcount() << endl;

    return 0;
}

Ir prints out 0 with the same file.

As you can see the only difference is changing the stream to a wstream and the buffer to a wstring.

I've tried both g++ 8.2.1 and clang++ 6.0.1 under OpenSUSE Tumbleweed.

Where is the problem/my error?

Upvotes: 2

Views: 404

Answers (1)

eerorika
eerorika

Reputation: 238441

You should be using std::basic_ifstream<char16_t> and std::u16string for UTF-16. std::wifstream and std::wstring are not appropriate because the width of wchar_t is implementation defined. In Linux in particular, it is (usually?) 32 bits wide.

Same for character literals. You should use u'\0' etc. instead of L'\0'.

Upvotes: 2

Related Questions