Reputation: 795
I have a sample piece of C++ code that is throwing an exception on Linux:
namespace fs = std::filesystem;
const fs::path pathDir(L"/var/media");
const fs::path pathMedia = pathDir / L"COMPACTO - Diogo Poças.mxf" // <-- Exception thrown here
The exception being thrown is: filesystem error: Cannot convert character sequence: Invalid in or incomplete multibyte or wide character
I surmise that the issue is related to the use of the ç
character.
Linux Environment (not forgetting the fact that I'd like to run cross-platform):
Upvotes: 7
Views: 6438
Reputation: 1261
Looks like a GCC bug.
According to std::filesystem::path::path you should be able to call std::filesystem::path constructor with a wide-character string and that independent of underlying platform (that's the whole point of std::filesystem).
Clang shows correct behavior.
Upvotes: 5
Reputation: 31599
Unfortunately std::filesystem
was not written with operating system compatibility in mind, at least not as advertised.
For Unix based systems, we need UTF8 (u8"string"
, or just "string"
depending on the compiler)
For Windows, we need UTF16 (L"string"
)
In C++17 you can use filesystem::u8path
(which for some reason is deprecated in C++20). In Windows, this will convert UTF8 to UTF16. Now you can pass UTF16 to APIs.
#ifdef _WINDOWS_PLATFORM
//windows I/O setup
_setmode(_fileno(stdin), _O_WTEXT);
_setmode(_fileno(stdout), _O_WTEXT);
#endif
fs::path path = fs::u8path(u8"ελληνικά.txt");
#ifdef _WINDOWS_PLATFORM
std::wcout << "UTF16: " << path << std::endl;
#else
std::cout << "UTF8: " << path << std::endl;
#endif
Or use your own macro to set UTF16 for Windows (L"string"
), and UTF8 for Unix based systems (u8"string"
or just "string"
). Make sure UNICODE
is defined for Windows.
#ifdef _WINDOWS_PLATFORM
#define _TEXT(quote) L##quote
#define _tcout std::wcout
#else
#define _TEXT(quote) u8##quote
#define _tcout std::cout
#endif
fs::path path(_TEXT("ελληνικά.txt"));
_tcout << path << std::endl;
See also
https://en.cppreference.com/w/cpp/filesystem/path/native
std::fstream
which allows using UTF16 filename, and it's compatible for UTF8 read/write. For example the following code will work in Visual Studio:
fs::path utf16 = fs::u8path(u8"UTF8 filename ελληνικά.txt");
std::ofstream fout(utf16);
fout << u8"UTF8 content ελληνικά";
I am not sure if that's supported on latest gcc versions running on Windows.
Upvotes: 5