Violet Giraffe
Violet Giraffe

Reputation: 33579

Working with file path strings on Linux, what encoding to use?

What encoding does Linux use for its file APIs? How should I work with path strings in C++, what class to use? I mean paths with non-ASCII characters. On Windows I use UTF-16 and std::wstring, on Mac - UTF-8 and my own UTF-8 string class. But unfortunately my class is not available on Linux, so what should I use?

Upvotes: 2

Views: 3742

Answers (3)

mvp
mvp

Reputation: 116108

Internally, Linux permits to use any byte sequence for file name, except for null byte 0 and forward slash '/' (which is used as directory separator).

Common convention to permit Unicode file names on Linux is to use UTF-8 encoding for file names. Easiest way to achieve that is to use good old std::string (not std::wstring which is suggested on Windows), however, you may need to write your own class which will validate that it is indeed valid UTF-8.

There are few examples of ready-to-use libraries that provide handling of UTF-8 strings:

  • ICU (robust but very heavy).
  • Glib::ustring (has implicit casts to std::string, GPL).
  • UTF8-CPP (very lightweight, header-only).

Upvotes: 7

Prof. Falken
Prof. Falken

Reputation: 24867

Linux does not enforce an encoding on file names. Using UTF-8 is common though.

Upvotes: 1

code_fodder
code_fodder

Reputation: 16331

You can still use the standard type wchar_t (and %ls for printf/scanf when using wchar_t). This type allows you to use non-ascii characters.

wchar_t mystring[50] = L"sometext";

Note: that to make normal char string into wchar_t you have to use the pre-fix "L", you have to remember this is not the same as a char type so its a bit funny to use :o

Upvotes: 0

Related Questions