saimmm
saimmm

Reputation: 11

Can't reproduce ANSI Encoding to Windows-1256 by C#

I have some encoded data in mdb file, like this Úæäí, and ÚáÇä; I tried with notepad++, first creating new file with ANSI Encoding, after that putting that text on to it, finally changing the encoding to Windows-1256, the result is عوني ,علان perfect, but i can't reproduce this scenario by coding(C#). here is the Code:

public string Decode(DataRow rw,string colName)
{
   Encoding srcEnc = Encoding.GetEncoding("from what ?");
   Encoding destEnc = Encoding.GetEncoding("1256");// arabic encoding
   byte[] srcVal = rscEnc.GetBytes(rw[colName].ToString());
   byte[] destVal = Encoding.Convert(srcEnc,destEnc,srcVal);
   return destEnc.GetString(destVal);
}

Upvotes: 0

Views: 2522

Answers (1)

Charles Mager
Charles Mager

Reputation: 26213

The problem is you're converting between encodings. This isn't actually what you're trying to achieve, you just want to re-interpret the encoded text.

To do this, you need to get the bytes for your ANSI string and then decode it using the correct encoding.

So, leaving out the conversion:

var latin = Encoding.GetEncoding(1252);
var bytes = latin.GetBytes("Úæäí");

var arabic = Encoding.GetEncoding(1256);            
var result = arabic.GetString(bytes);   

result is عوني

A caveat, as Hans points out in the comments: Windows-1252 has 5 byte values that are unused (0x81, 0x8D, 0x8F, 0x90, and 0x9D). If these correspond to characters in Windows-1256 used in the original text, then your source data is corrupted as these characters will have been lost on the initial decoding using 1252. Ideally, you want to start with the original encoded source.

Upvotes: 5

Related Questions