Reading unicode text files in Visual C++

Question

I'm trying to read a simple unicode (UTF-16) text file with just some numbers in Visual C. This seemed like a trivial task, but I can't get it to read the file in correct encoding.

My file looks like this:

1337 42 23

Since it's unicode, it also has the 0xFF 0xFE BOM mark at the beginning.

I've tried wifstream() and fwscanf(), but both get stuck of the BOM and even after skipping the BOM, both functions read only "1" (they get confused on the 0x00 character, i.e. they're not actually reading the file as unicode).

So, the question is, how do you read and parse a simple unicode file in a unicode Visual C++ app?

Here's my source (fwscanf version):

int _tmain(int argc, _TCHAR* argv[])
{
    int x;
    FILE * f = _wfopen(L"bla.txt", L"r+");
    if (!f) return -1;

    fseek(f, 2, SEEK_SET); // skip the BOM mark

    fwscanf(f, L"%d", &x);
    wprintf(L"Number read: %d
", x);

    fclose(f);
    return 0;
}

And the output is:

Number read: 1

Hans Passant · Accepted Answer

The Microsoft CRT supports BOM auto-detection since VS2005. You enable it by using the "ccs" attribute in the mode argument. Like this:

FILE * f = _wfopen(L"c:\temp\test.txt", L"rt, ccs=UNICODE");

It falls back to ansi if the file doesn't have a BOM. You can use "UTF-8" or "UTF-16LE" for troublemakers like that. This is of course non-standard.

Reading unicode text files in Visual C++

Answers (1)

Related Questions