Reputation: 269
I didn't find it in C++ standard saying that codecvt
s are compatible with mbtowc
s. And the C standard specifies mbtowc
as
If the function determines that the next multibyte character is complete and valid, it determines the value of the corresponding wide character and then, if pwc is not a null pointer, stores that value in the object pointed to by pwc.
But what does it mean by "value of the corresponding wide character"? Is it affected by locale? The definition of wide character says
wide character
value representable by an object of typewchar_t
, capable of representing any character in the current locale.
but later it "redefines" the "current locale" as an implementation-defined one.
The value of a wide character constant containing a single multibyte character that maps to a single member of the extended execution character set is the wide character corresponding to that multibyte character, as defined by the
mbtowc
,mbrtoc16
, ormbrtoc32
function as appropriate for its type, with an implementation-defined current locale.
As this answer says, wide-exec-charset
has nothing to do with C library function, but some C++ API such as filesystem::path
still take advantage of it.
Now I'm really confused, what is the encoding used by multibyte/wide character conversion functions? Is it locale dependent or implementation defined? Or even somehow the same as codecvt
s' UCS-2 or UTF-32?
Upvotes: 2
Views: 461
Reputation: 11
Note: I practically have no knowledge of C++ and thus my answer will regard the C language. It will also assume a glibc system (which is a system that uses the GNU C Library). Moreover, the body of your question is beyond my knowledge, so I'll answer the headline and (most of) the last paragraph of your question.
According to the GNU implementation of the standard C library:
We already said above that the currently selected locale for the LC_CTYPE category decides the conversion that is performed by the functions we are about to describe. Each locale uses its own character set (given as an argument to localedef) and this is the one assumed as the external multibyte encoding. The wide character set is always UCS-4 in the GNU C Library.
Answering your questions:
Is there any locale that affects wide character encoding?
No, because locales do not specify wide character encodings, they only specify multibyte encodings.
what is the encoding used by multibyte/wide character conversion functions?
The conversion functions use the encoding defined by the locale as the multibyte encoding, and UCS-4 as the wide character encoding.
Is it locale dependent or implementation defined?
Multibyte encodings are locale-dependent. Wide character encodings are implementation-defined.
As for the -fwide-exec-charset
compiler option, it merely determines how wide character literals will be encoded in the resulting executable file. As this linked answer says: it is useful when cross-compiling for a system that has a C library implementation that was built with a wide (internal) character set that is different from that of your machine's glibc implementation.
This is a good introduction to extended characters. It explains the rationale behind internal (wide) and external (multibyte) encodings.
Upvotes: 1