Reputation:
I am received some Russian text over network. Here is dump of those bytes:
When I am trying to interpret this as ASCII string of course it doesn't work. Neither this seems to be a UTF8 encoding. Can someone help how to read these bytes in C# as string? (you can see debugger shows the letters next to them)
Upvotes: 0
Views: 3471
Reputation: 38164
var input = "Привет, люди!";
var utf8bytes = Encoding.UTF8.GetBytes(input);
var win1251Bytes = Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding("windows-1251"), utf8bytes);
File.WriteAllBytes(@"foo.txt", win1251Bytes);
Upvotes: 0
Reputation: 280
In general, if you know where you get the text in most cases you have some information about the encoding, so you can simply use the class "Encoding", select the appropriate encoding and call the GetString
For example so Encoding.UTF8.GetString()
or so Encoding.GetEncoding(1251).GetString()
If you do not have any information about encoding, then it is a different task, you have to look for some algorithm for encoding detection
Upvotes: 1
Reputation: 2218
Looks like cyrillic, codepage 1251.
var bytes = new byte[]
{
210, 240, 224, 237, 231, 224, 234, 246, 232, 255, 32, 237, 229, 32, 236, 238, 230, 229, 242, 32, 225, 251, 242
};
var text = System.Text.Encoding.GetEncoding(1251).GetString(bytes);
// text = "Транзакция не может быт"
Not sure if there's a better way to figure it out than looping over the available codepages and see what looks looks right:
for (var i = 1; i < 100000; ++i)
{
try
{
Console.WriteLine(System.Text.Encoding.GetEncoding(i).GetString(bytes));
Console.WriteLine("Encoding: {0}", i);
Console.WriteLine(System.Text.Encoding.GetEncoding(i).EncodingName);
Console.WriteLine();
}
catch
{
}
}
Upvotes: 1