Anh Tran
Anh Tran

Reputation: 105

Converting char[] with unicode encoding to filesystem::path

To simplify the problem I'm trying to solve, let's say I'm trying to build an CLI that check if a path exists using std::filesystem::exists(path). This path came from the user input.

Here's two constraint:

  1. I'm developing on Windows.
  2. I cannot use wmain because I don't have access to the main function (imagine the CLI is a third-party software and I'm writing a plugin to it).
  3. The argc and argv will be passed to my function with the exact signature below (see the snippet code).

Here's an example code that I write:

#include <iostream>
#include <filesystem>

void my_func(int argc, char *argv[]) {
    // This is the only place I can do my works...
    if (argc > 1)
    {
        bool is_exist = std::filesystem::exists(argv[1]);

        std::cout << "The path (" <<  argv[1] << ") existence is: " << is_exist << std::endl;
    }
    else
    {
        std::cout << "No path defined" << std::endl;
    }
    
}

int main(int argc, char *argv[]) {
    my_func(argc, argv);
    return 1;
}

The user can use the software with the following command:

./a.exe path/to/my/folder/狗猫

Currently, rubbish is being printed on the terminal. But from my research, this is not C++ problem, but rather a cmd.exe problem.

And if it is not already obvious, the above code snippet does not work even though there is a folder called 狗猫.

My guess is I have to manually convert char[] to filesystem::path somehow. Any help is greatly appreciated.

Upvotes: 2

Views: 955

Answers (1)

rustyx
rustyx

Reputation: 85316

There is a way to get the wchar_t *argv[] without wmain (see GetCommandLineW), if you're willing to change your API slightly. Then you can construct std::filesystem::path directly from that.

With char *argv[] you're bound to the user's current console codepage and you can only hope that it supports the characters you're interested in (characters outside the codepage will get irrecoverably corrupted). For example for shift-JIS, use chcp 932 before starting the program.

Then follow these steps:

  1. Get the current console codepage with GetConsoleCP,
  2. Convert the char string to UTF-16 with MultiByteToWideChar,
  3. Use the wchar_t overload to construct std::filesystem::path.

Code example:

const char* mbStr = argv[1];

unsigned mbCP = GetConsoleCP();
int wLen = MultiByteToWideChar(mbCP, 0, mbStr, -1, nullptr, 0);
std::wstring wStr(wLen, 0);
MultiByteToWideChar(mbCP, 0, mbStr, -1, wStr.data(), wLen);

std::filesystem::path myPath(wStr);

if (std::filesystem::exists(myPath)) {
    // . . .
}

Upvotes: 1

Related Questions