Reputation: 161954
$ echo $LANG
en_US.UTF-8
$ echo 你好 | iconv -f UTF8 -t UTF32BE | tee hello.txt
O`Y}
$ vim -N -u NONE --cmd 'set tenc=utf32 enc=utf32 fencs=utf32be' hello.txt
你好
~
~
~
:set tenc enc fenc
termencoding=ucs-4
encoding=ucs-4
fileencoding=ucs-4
The terminal cannot display UTF32
characters.
After modifying several encoding options of Vim.
Vim still can display UTF32
without any problems.
Why?
Upvotes: 3
Views: 2080
Reputation: 224989
Interesting. You can run your command inside script
to verify that Vim is actually writing UTF-8 to your terminal.
The help for 'charconvert'
and 'encoding'
give oblique hints as to the internal operation, but I did not find a corresponding hint that this same behavior is applied to termencoding
. Respectively:
Vim internally uses UTF-8 instead of UCS-2 or UCS-4.
and
When "unicode", "ucs-2" or "ucs-4" is used, Vim internally uses utf-8.
So, we will use the source (version 7.3.548, specifically) to find out what is happening.
The value for the termencoding
/tenc
option is stored in the global variable p_tenc
.
did_set_string_option()
seems to handle the setting of string-valued options.
When handling termencoding
, it calls convert_setup()
to setup output_conv
(for converting encoding
to termencoding
).
The comment for convert_setup
gives the first hint as to what is happening:
Note: cannot be used for conversion from/to ucs-2 and ucs-4 (will use utf-8 instead).
convert_setup
calls convert_setup_ext()
with TRUE for both of the {from
,to
}_unicode_is_utf8
parameters.
from
,to
}_unicode_is_utf8
are true (they are), it sets the local variables {from
,to
}_is_utf8
based on whether the specified encodings have the ENC_UNICODE property (ucs-4
does, as do all of Vim’s utf-…
and ucs-…
encodings).iconv
, Vim substitutes utf-8
if {from
,to
}_is_utf8
are true (in this case, they are).Ultimately, the values of encoding
and termencoding
are handled in the same way here. utf-32
is mapped to ucs-4
, which has ENC_UNICODE, and Vim substitutes the desired encoding with UTF-8. Maybe there are some hints in the commit logs that indicate why termencoding
is treated this way; I will leave that archeology to someone else, though.
The code path for handling fileencoding
is different. It only forces UTF-8 for the “internal side” of the conversion (and only if a “Unicode” encoding
is in effect).
Upvotes: 4