kev
kev

Reputation: 161954

How does terminal encoding work in vim?

In GNOME Terminal(3.4.1.1)

$ echo $LANG
en_US.UTF-8

$ echo 你好 | iconv -f UTF8 -t UTF32BE | tee hello.txt
O`Y}

In vim(7.3):

$ vim -N -u NONE --cmd 'set tenc=utf32 enc=utf32 fencs=utf32be' hello.txt
你好
~
~
~    

:set tenc enc fenc
  termencoding=ucs-4
  encoding=ucs-4
  fileencoding=ucs-4

The terminal cannot display UTF32 characters.
After modifying several encoding options of Vim.
Vim still can display UTF32 without any problems.
Why?

Upvotes: 3

Views: 2080

Answers (1)

Chris Johnsen
Chris Johnsen

Reputation: 224989

Interesting. You can run your command inside script to verify that Vim is actually writing UTF-8 to your terminal.

The help for 'charconvert' and 'encoding' give oblique hints as to the internal operation, but I did not find a corresponding hint that this same behavior is applied to termencoding. Respectively:

Vim internally uses UTF-8 instead of UCS-2 or UCS-4.

and

When "unicode", "ucs-2" or "ucs-4" is used, Vim internally uses utf-8.

So, we will use the source (version 7.3.548, specifically) to find out what is happening.

The value for the termencoding/tenc option is stored in the global variable p_tenc.

  • did_set_string_option() seems to handle the setting of string-valued options.

    • When handling termencoding, it calls convert_setup() to setup output_conv (for converting encoding to termencoding).

      The comment for convert_setup gives the first hint as to what is happening:

      Note: cannot be used for conversion from/to ucs-2 and ucs-4 (will use utf-8 instead).

      • convert_setup calls convert_setup_ext() with TRUE for both of the {from,to}_unicode_is_utf8 parameters.

        • When {from,to}_unicode_is_utf8 are true (they are), it sets the local variables {from,to}_is_utf8 based on whether the specified encodings have the ENC_UNICODE property (ucs-4 does, as do all of Vim’s utf-… and ucs-… encodings).
          When it comes time to open an iconv, Vim substitutes utf-8 if {from,to}_is_utf8 are true (in this case, they are).

Ultimately, the values of encoding and termencoding are handled in the same way here. utf-32 is mapped to ucs-4, which has ENC_UNICODE, and Vim substitutes the desired encoding with UTF-8. Maybe there are some hints in the commit logs that indicate why termencoding is treated this way; I will leave that archeology to someone else, though.

The code path for handling fileencoding is different. It only forces UTF-8 for the “internal side” of the conversion (and only if a “Unicode” encoding is in effect).

Upvotes: 4

Related Questions