ZeroDefect
ZeroDefect

Reputation: 795

Cross-platform way to handle std::string/std::wstring with std::filesystem::path

I have a sample piece of C++ code that is throwing an exception on Linux:

namespace fs = std::filesystem;
const fs::path pathDir(L"/var/media");
const fs::path pathMedia = pathDir / L"COMPACTO - Diogo Poças.mxf" // <-- Exception thrown here

The exception being thrown is: filesystem error: Cannot convert character sequence: Invalid in or incomplete multibyte or wide character

I surmise that the issue is related to the use of the ç character.

  1. Why is this wide string (wchar_t) an "invalid or incomplete multibyte or wide character"?
  2. Going forward, how do I make related code cross-platform to run on Windows and/or Linux.
    • Are there helper functions I need to use?
    • What rules do I need to enforce from a programmer's PoV?
    • I've seen a response here that says "Don't use wide strings on Linux", do I use the same rules for Windows?

Linux Environment (not forgetting the fact that I'd like to run cross-platform):

Upvotes: 7

Views: 6438

Answers (2)

plexando
plexando

Reputation: 1261

Looks like a GCC bug.

According to std::filesystem::path::path you should be able to call std::filesystem::path constructor with a wide-character string and that independent of underlying platform (that's the whole point of std::filesystem).

Clang shows correct behavior.

Upvotes: 5

Barmak Shemirani
Barmak Shemirani

Reputation: 31599

Unfortunately std::filesystem was not written with operating system compatibility in mind, at least not as advertised.

For Unix based systems, we need UTF8 (u8"string", or just "string" depending on the compiler)

For Windows, we need UTF16 (L"string")

In C++17 you can use filesystem::u8path (which for some reason is deprecated in C++20). In Windows, this will convert UTF8 to UTF16. Now you can pass UTF16 to APIs.

#ifdef _WINDOWS_PLATFORM
    //windows I/O setup
    _setmode(_fileno(stdin), _O_WTEXT);
    _setmode(_fileno(stdout), _O_WTEXT);
#endif

fs::path path = fs::u8path(u8"ελληνικά.txt");

#ifdef _WINDOWS_PLATFORM
    std::wcout << "UTF16: " << path << std::endl;
#else
    std::cout <<  "UTF8:  " << path << std::endl;
#endif

Or use your own macro to set UTF16 for Windows (L"string"), and UTF8 for Unix based systems (u8"string" or just "string"). Make sure UNICODE is defined for Windows.

#ifdef _WINDOWS_PLATFORM
#define _TEXT(quote) L##quote
#define _tcout std::wcout
#else
#define _TEXT(quote) u8##quote
#define _tcout std::cout
#endif

fs::path path(_TEXT("ελληνικά.txt"));
_tcout << path << std::endl;

See also
https://en.cppreference.com/w/cpp/filesystem/path/native


Note, Visual Studio has a special constructor for std::fstream which allows using UTF16 filename, and it's compatible for UTF8 read/write. For example the following code will work in Visual Studio:

fs::path utf16 = fs::u8path(u8"UTF8 filename ελληνικά.txt");
std::ofstream fout(utf16);
fout << u8"UTF8 content ελληνικά";

I am not sure if that's supported on latest gcc versions running on Windows.

Upvotes: 5

Related Questions