katosh
katosh

Reputation: 382

How to fix wrong text file encoding?

I have a text file that claims to be UTF-8 encoded. That is, when i call file -I $file it prints $file: text/plain; charset=utf-8. But when I open it with UTF-8 encoding some characters seem corrupted. That is, the file is suppose to be german but the special german characters like ö are displayed as ö.

I guessed that the claim to be UTF-8 is wrong and executed the enca script to guess the real encoding. But sadly enca tells me that the language de (german) is not supported.

Is there another way to fix the file?

Upvotes: 5

Views: 7440

Answers (3)

Ben
Ben

Reputation: 8905

To get a file to read properly in a given encoding, you need three things:

  1. 'encoding' which controls the characters Vim can store and display must be able to represent all the characters in your file.
  2. 'fileencodings' which controls which encodings Vim will attempt to recognize must be set in a way that your file encoding is recognized
  3. 'fileencoding' must be set properly, normally by being automatically detected by the 'fileencodings' setting, to the encoding your file is stored in.

Note that (2) is not strictly necessary, but if the file encoding is detected improperly, you will need to manually re-read the file in the correct encoding. For example, using :e ++enc=utf-8 for a utf-8 file that was not detected as such.

See http://vim.wikia.com/wiki/Working_with_Unicode for getting all three of these concepts correct.

Upvotes: 3

Jukka K. Korpela
Jukka K. Korpela

Reputation: 201568

The UTF-8 encoded form of “ö” U+00F6 is 0xC3 0xB6, and if these bytes are interpreted in ISO-8859-1 they are “ö” (U+00C3 U+00B6). So either the file is actually being read and interprered as ISO-8859-1, even though you expect otherwise, or there has been a double encoding: previously, the file or part thereof has been read as if it were ISO-8859-1 (even though it was UTF-8), and the misinterpreted data has then been written out as UTF-8 encoded.

Upvotes: 4

Alexandre DuBreuil
Alexandre DuBreuil

Reputation: 5631

You can also check the encoding with :set encoding, and set it accordingly with :set encoding=utf-8. If you still see incorrect characters, that means those where not written in the file as utf-8 and you'll need to convert them.

EDIT : if you could submit your file it would help

Upvotes: 2

Related Questions