Mikhail
Mikhail

Reputation: 21749

Change narrow string encoding or missing std::filesystem::path::imbue

I'm on Windows and I'm constructing std::filesystem::path from std::string. According to constructor reference (emphasis mine):

If the source character type is char, the encoding of the source is assumed to be the native narrow encoding (so no conversion takes place on POSIX systems)

If I understand correctly, this means string content will be treated as encoded in ANSI under Windows. To treat it as encoded in UTF-8, I need to use std::filesystem::u8path() function. See the demo: http://rextester.com/PXRH65151

I want constructor of path to treat contents of narrow string as UTF-8 encoded. For boost::filesystem::path I could use imbue() method to do this:

boost::filesystem::path::imbue(std::locale(std::locale(), new std::codecvt_utf8_utf16<wchar_t>()));

However, I do not see such method in std::filesystem::path. Is there a way to achieve this behavior for std::filesystem::path? Or do I need to spit u8path all over the place?

Upvotes: 5

Views: 2086

Answers (2)

Nicol Bolas
Nicol Bolas

Reputation: 473232

For the sake of performance, path does not have a global way to define locale conversions. Since C++ pre-20 does not have a specific type for UTF-8 strings, the system assumes any char strings are narrow character strings. So if you want to use UTF-8 strings, you have to spell it out explicitly, either by providing an appropriate conversion locale to the constructor or by using u8path.

C++20 gave us char8_t, which is always presumed to be UTF-8. So if you consistently use char8_t-based strings (like std::u8string), path's implicit conversion will pick up on it and work appropriately.

Upvotes: 0

ceztko
ceztko

Reputation: 15207

My solution to this problem is to fully alias the std::filesystem to a different namespace named std::u8filesystem with classes and methods that treat std::string as UTF-8 encoded. Classes inherit their corresponding in std::filesystem with same name, without adding any field or virtual method to offer full API/ABI interoperability. Full proof of concept code here, tested only on Windows so far and far to be complete. The following snippet shows the core working of the helper:

std::wstring U8ToW(const std::string &string);

namespace std
{
    namespace u8filesystem
    {

    #ifdef WIN32
        class path : public filesystem::path
        {
        public:
            path(const std::string &string)
                : fs::path(U8ToW(path))
            {
            }

            inline std::string string() const
            {
                return filesystem::path::u8string();
            }
        }
    #else
        using namespace filesystem;
    #endif
    }
}

Upvotes: 2

Related Questions