Reputation: 9278
I have some hex data:
48|65|6c|6c|6f|20|53|68|61|72|6f|6b|2e|
20|3d|43|46|3d|46|30|3d|45|38|3d|45|32|3d|45|35|3d|46|32|0d|0a|0d|0a|2e|0d|0a|
The first text string is "Hello Sharok" (without quotes). The second text string is "Привет" (without quotes, "Привет" is "Hello" on Russian). How do I convert this to readable text (the first string is OK, the second string fails.)?
Code page: Windows-1251 (CP1251)
Upvotes: 1
Views: 2552
Reputation: 700332
Create an Encoding
object for the windows-1251 encoding, and decode the byte array:
byte[] data = {
0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x53, 0x68, 0x61, 0x72, 0x6f, 0x6b, 0x2e
};
string text = Encoding.GetEncoding(1251).GetString(data);
The second set of data doesn't decode into russian characters, but into this (including a space at the start and a line break (CR+LF) ending each of the three lines):
=CF=F0=E8=E2=E5=F2
.
To get the string that you want, you would first have to decode the data into a string, then extract the hexadecimal codes from the string, convert those into bytes, and decode those bytes:
Encoding win = Encoding.GetEncoding(1251);
string text = win.GetString(
Regex.Matches(win.GetString(data), "=(..)")
.OfType<Match>()
.Select(m => Convert.ToByte(m.Groups[1].Value, 16))
.ToArray()
);
Upvotes: 1
Reputation: 108800
For the second one you can use this:
string input="20|3d|43|46|3d|46|30|3d|45|38|3d|45|32|3d|45|35|3d|46|32|0d|0a|0d|0a|2e|0d|0a";
byte[] bytes=input.Split('|').Select(s=>byte.Parse(s, System.Globalization.NumberStyles.HexNumber)).ToArray();
string text = Encoding.GetEncoding(1251).GetString(bytes);
StringBuilder text2=new StringBuilder();
for(int i=0;i<text.Length;i++)
{
if (text[i]=='=')
{
string hex=text[i+1].ToString()+text[i+2].ToString();
byte b=byte.Parse(hex, System.Globalization.NumberStyles.HexNumber);
text2.Append(Encoding.GetEncoding(1251).GetString(new byte[]{b}));
i+=2;
}
else
{
text2.Append(text[i]);
}
}
First it decodes the | seperated string. Which the contains = escaped hex values the following loop decodes.
Upvotes: 2
Reputation: 16761
Second string is not Windows-1251 but quoted-printable " =CF=F0=E8=E2=E5=F2<CR><LF><CR><LF>.
" and decoded characters in it are actually Windows-1251. So you need to iterate the string, and build output string one by one character. If you run into escape sign (=) then next two character are hex digits of Windows-1251. Decode two digits and add resulting character to output string. Loop until end.
Upvotes: 3
Reputation: 10015
Have a look here
How can I convert a cp1251 byte array to a utf8 String?
And this is useful
http://bytes.com/topic/c-sharp/answers/274352-utf8-windows-1251-conversion
Upvotes: 1