Reputation: 5645
recently i saved a text file on my computer but when i open it again i saw some strings like:
"˜ÌÇí ÍÑÝã ÚÌíÈå¿"
now i want to know is it possible to reconvert it to the original text (UTF8)?
i try this codes but it doesn't works
string tempStr="˜ÌÇí ÍÑÝã ÚÌíÈå¿";
Encoding ANSI = Encoding.GetEncoding(1256);
byte[] ansiBytes = ANSI.GetBytes(tempStr);
byte[] utf8Bytes = Encoding.Convert(ANSI, Encoding.UTF8, ansiBytes);
String utf8String = Encoding.UTF8.GetString(utf8Bytes);
Upvotes: 1
Views: 646
Reputation: 111940
You can use something like:
string str = Encoding.GetEncoding(1256).GetString(Encoding.GetEncoding("iso-8859-1").GetBytes(tempStr))
The string wasn't really decoded... Its byte
s where simply "enlarged" to char
, with something like:
byte[] bytes = ...
char[] chars = new char[bytes.Length];
for (int i = 0; i < bytes.Length; i++)
{
chars[i] = bytes[i];
}
string str = new string(chars);
Now... This transformation is the same that is done by the codepage ISO-8859-1. So I could simply have done the reverse, or I could have used that codepage to do it for me, I selected the second one.
Encoding.GetEncoding("iso-8859-1").GetBytes(tempStr)
this gave me the original byte[]
Then I've done some tests and it seems that the text in the beginning wasn't UTF8, it was in codepage 1256, that is an arabic codepage. So I
string str = Encoding.GetEncoding(1256).GetString(...);
The only thing, the ˜
doesn't seem to be part of the original string.
There is another possibility:
string str = Encoding.GetEncoding(1256).GetString(Encoding.GetEncoding(1252).GetBytes(tempStr));
The codepage 1252 is the codepage used in the USA and in a big part of Europe. If you have a Windows configured to English, there is a good chance it uses the 1252 as the default codepage. The result is slightly different than using the iso-8859-1
Upvotes: 2