How to convert saved text file encoding to UTF8?

Question

recently i saved a text file on my computer but when i open it again i saw some strings like:

 "˜ÌÇí ÍÑÝã ÚÌíÈå¿"

now i want to know is it possible to reconvert it to the original text (UTF8)?

i try this codes but it doesn't works

  string tempStr="˜ÌÇí ÍÑÝã ÚÌíÈå¿"; 
  Encoding ANSI = Encoding.GetEncoding(1256);
  byte[] ansiBytes = ANSI.GetBytes(tempStr);
  byte[] utf8Bytes = Encoding.Convert(ANSI, Encoding.UTF8, ansiBytes);
  String utf8String = Encoding.UTF8.GetString(utf8Bytes);

xanatos · Accepted Answer

You can use something like:

string str = Encoding.GetEncoding(1256).GetString(Encoding.GetEncoding("iso-8859-1").GetBytes(tempStr))

The string wasn't really decoded... Its bytes where simply "enlarged" to char, with something like:

byte[] bytes = ...
char[] chars = new char[bytes.Length];
for (int i = 0; i < bytes.Length; i++)
{
    chars[i] = bytes[i];
}
string str = new string(chars);

Now... This transformation is the same that is done by the codepage ISO-8859-1. So I could simply have done the reverse, or I could have used that codepage to do it for me, I selected the second one.

Encoding.GetEncoding("iso-8859-1").GetBytes(tempStr)

this gave me the original byte[]

Then I've done some tests and it seems that the text in the beginning wasn't UTF8, it was in codepage 1256, that is an arabic codepage. So I

string str = Encoding.GetEncoding(1256).GetString(...);

The only thing, the ˜ doesn't seem to be part of the original string.

There is another possibility:

string str = Encoding.GetEncoding(1256).GetString(Encoding.GetEncoding(1252).GetBytes(tempStr));

The codepage 1252 is the codepage used in the USA and in a big part of Europe. If you have a Windows configured to English, there is a good chance it uses the 1252 as the default codepage. The result is slightly different than using the iso-8859-1

How to convert saved text file encoding to UTF8?

Answers (1)

Related Questions