Frozen Flame
Frozen Flame

Reputation: 3245

Rationale of fileencoding and encoding in vim or elsewhere

I don't get the point why there are encoding and also fileencoding in VIM.

In my knowledge, a file is like an array of bytes. When we create a text file, we create an array of characters (or symbols), and encode this character-array with encoding X to an array of bytes, and save the byte-array to disk. When read in text editor, it decode the byte-array with encoding X to reconstruct the original character-array, and display each character with a graph according to the font. In this process, only one encoding involved.

In VIM set encoding and fileencoding utf-8, which refers wiki of VIM about working with unicode,

encoding sets how vim shall represent characters internally. Utf-8 is necessary for most flavors of Unicode.

fileencoding sets the encoding for a particular file (local to buffer)

"How vim shall represent characters internally" vs "encoding for a particular file"... resambles Unicode vs UTF-8? If so, why should a user bother with the former?

Any hint?

Upvotes: 4

Views: 313

Answers (2)

deceze
deceze

Reputation: 522522

I'll preface this by saying that I'm not a vim expert by any means.

I think the flaw in your thinking is here:

When read in text editor, it decode the byte-array with encoding X to reconstruct the original character-array, and display each character with a graph according to the font.

The thing is, vim is not responsible for rendering the glyph here. vim reads bytes from a file, stores them internally and sends bytes to the terminal which renders the glyph using a font. vim itself never touches fonts and hence never really needs to understand "characters". It only needs to work with bytes internally which it moves back and forth between files, internal buffers and the terminal.

Hence, there are three possible different byte storages involved:

vim will convert between those as necessary. It could read from a Shift-JIS encoded file, store the data internally as UTF-16 and send/receive I/O to/from the terminal in UTF-8. I am not sure why you'd want to change the internal byte handling of vim (again, not an expert), but in any case, you can alter that setting if you want to.

Hypothesising follows: If you set encoding to a Unicode encoding, you're safe to be able to handle any possible character you may encounter. However, in some circumstances those Unicode encodings may be too large to comfortably fit into memory in very limited systems, so in this case you may want to use a more specialised encoding if you know what you're doing.

Upvotes: 5

Ingo Karkat
Ingo Karkat

Reputation: 172718

You're right; most programs have a fixed internal encoding (speaking of C datatypes, that's either char, which mostly then uses the underlying locale and may not be able to represent all characters, or UTF-8; or wchar (wide characters) which can represent the Unicode range). The choice is mainly driven by programming language and available APIs (as having to convert back and forth is tedious and not efficient).

Vim, because it supports a large variety of platforms (starting with the old Amiga where development started) and is geared towards programmers and highly advanced users allows to configure the internal representation.

heuristics

  • As long as all characters are recognizable, you don't need to care.
  • If certain files don't look right, you have to teach Vim to recognize the encoding via 'fileencodings', or explicitly specify it.
  • If certain characters do not show up right, you need to switch the 'encoding'. With utf-8, you're on the safe side.
  • If you have problems in the terminal only, fiddle with 'termencoding'.

As you can see, though it can be confusing to the beginner, you actually have all the power available to you!

Upvotes: 6

Related Questions