Reputation: 11
I have some encoded data in mdb file, like this Úæäí, and ÚáÇä; I tried with notepad++, first creating new file with ANSI Encoding, after that putting that text on to it, finally changing the encoding to Windows-1256, the result is عوني ,علان perfect, but i can't reproduce this scenario by coding(C#). here is the Code:
public string Decode(DataRow rw,string colName)
{
Encoding srcEnc = Encoding.GetEncoding("from what ?");
Encoding destEnc = Encoding.GetEncoding("1256");// arabic encoding
byte[] srcVal = rscEnc.GetBytes(rw[colName].ToString());
byte[] destVal = Encoding.Convert(srcEnc,destEnc,srcVal);
return destEnc.GetString(destVal);
}
Upvotes: 0
Views: 2522
Reputation: 26213
The problem is you're converting between encodings. This isn't actually what you're trying to achieve, you just want to re-interpret the encoded text.
To do this, you need to get the bytes for your ANSI string and then decode it using the correct encoding.
So, leaving out the conversion:
var latin = Encoding.GetEncoding(1252);
var bytes = latin.GetBytes("Úæäí");
var arabic = Encoding.GetEncoding(1256);
var result = arabic.GetString(bytes);
result
is عوني
A caveat, as Hans points out in the comments: Windows-1252 has 5 byte values that are unused (0x81
, 0x8D
, 0x8F
, 0x90
, and 0x9D
). If these correspond to characters in Windows-1256 used in the original text, then your source data is corrupted as these characters will have been lost on the initial decoding using 1252. Ideally, you want to start with the original encoded source.
Upvotes: 5