user1978386
user1978386

Reputation: 257

How to open and read the content of a file with unicode path or filename by using standard API?

How to open a file which path or file name contains unicode characters and read or write it's content without using any special API ?. How to do it using only std libraries if it's possible or using only windows API ?. I did try std::wifstream to open a file as in the code sample below, but it doesn't compile. Looks like it doesn't take 'const wchar_t*' argument but 'const char*'. I'm using TDM-GCC 4.7.1 compiler which is included with Dev-C++ IDE.

#ifndef UNICODE
#define UNICODE
#endif
...
#include <clocale>
#include <windows.h>
#include <fstream>
...
int main(int argc, char **argv)
{
    setlocale(LC_ALL, "Polish_Poland.852") ;
    ...
    fileCompare(first, second) ;
    ...
}
...
bool fileCompare(wstring first, wstring second)  // This function doesn't compile !
{
    using namespace std ;
    wifstream fin0(first.c_str(), ios::binary) ;
    wifstream fin1(second.c_str(), ios::binary) ;
    ...
}

Some complete example:

#ifndef UNICODE
#define UNICODE
#endif

#include <clocale>
#include <conio.h>
#include <windows.h>
#include <fstream>
#include <string>
#include <iostream>

using namespace std ;

bool fileCompare(wstring first, wstring second) ;

int main(int argc, char **argv)
{
    setlocale(LC_ALL, "Polish_Poland.852") ;

    wstring first, second ;
    first = L"C:\\A.dat" ;
    second = L"C:\\E.dat" ;

    fileCompare(first, second) ;

    getch() ;
    return 0 ;
}

bool fileCompare(wstring first, wstring second)  // This function doesn't compile !
{
    wifstream fin0(first.c_str(), ios::binary) ;
    wifstream fin1(second.c_str(), ios::binary) ;

}

Also when I replace L"C:\A.dat" and L"C:\E.dat" to strings containing Polish characters it outputs an error about illegal byte sequence.

Upvotes: 3

Views: 2594

Answers (1)

Oncaphillis
Oncaphillis

Reputation: 1908

The wifstream doesn't deal with the issue of filename encoding. As far as I know the filenames of wifstream and ifstream are all char based not wchar_t based. You will have to provide the filename in the char encoding used by your OS e.g. latin1, utf8 etc..

The wifstream however enables you to read a stream of wchar_t. You may tell the stream what input you expect by imbuing The stream:

e.g.

 // We expect the file to be UTF8 encoded
 std::locale locale("en_US.utf8");
 fin0.imbue(locale);

EDIT: If you need to transform your file names (or any string) from wchar_t into the appropriate char encoding you may dive deeper into the theme of codecvt facets of locales.

// Method translates wchar_t => pl_PL.iso88592" encoding
std::string to_string(const std::wstring & wstr)  
{ 

    typedef std::codecvt< wchar_t, char, std::mbstate_t > ccvt_t;  

    std::locale loc("pl_PL.iso88592");    

    const ccvt_t & facet = std::use_facet<ccvt_t>( loc );  

    std::string s;  
    {  
        std::mbstate_t st=mbstate_t();  

        const wchar_t *wac = wstr.c_str();  
        const wchar_t *wou = wac + wstr.length();  
        const wchar_t *wnx = wac;   

        ccvt_t::result r = ccvt_t::ok;  

        while(wou!=wnx && (r==ccvt_t::ok || r==ccvt_t::partial))  
        {  
            static const int l = 100;  
            static char cou[l];  
            char *cnx=NULL;  
            r = facet.out(st,wac,wou,wnx,cou,cou+l,cnx);  
            s+=std::string(cou,cnx-cou);  
            wac=wnx;  
        }  
    }  

    return s;  
} 

What kind of std::locale is supported and how you may specify it may be OS dependent.

Upvotes: 1

Related Questions