Reputation:

Trying to detect encoding of Russian text-and read as string

I am received some Russian text over network. Here is dump of those bytes:

When I am trying to interpret this as ASCII string of course it doesn't work. Neither this seems to be a UTF8 encoding. Can someone help how to read these bytes in C# as string? (you can see debugger shows the letters next to them)

Upvotes: 0

Answers (3)

StepUp

Reputation: 38164

var input = "Привет, люди!";
var utf8bytes = Encoding.UTF8.GetBytes(input);
var win1251Bytes = Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding("windows-1251"), utf8bytes);
File.WriteAllBytes(@"foo.txt", win1251Bytes);

Upvotes: 0

Pyfhon

Reputation: 280

In general, if you know where you get the text in most cases you have some information about the encoding, so you can simply use the class "Encoding", select the appropriate encoding and call the GetString

For example so Encoding.UTF8.GetString() or so Encoding.GetEncoding(1251).GetString()

If you do not have any information about encoding, then it is a different task, you have to look for some algorithm for encoding detection

Upvotes: 1

Kvam

Reputation: 2218

Looks like cyrillic, codepage 1251.

var bytes = new byte[]
{
    210, 240, 224, 237, 231, 224, 234, 246, 232, 255, 32, 237, 229, 32, 236, 238, 230, 229, 242, 32, 225, 251, 242
};
var text = System.Text.Encoding.GetEncoding(1251).GetString(bytes);
// text = "Транзакция не может быт"

Not sure if there's a better way to figure it out than looping over the available codepages and see what looks looks right:

for (var i = 1; i < 100000; ++i)
{
    try
    {
        Console.WriteLine(System.Text.Encoding.GetEncoding(i).GetString(bytes));
        Console.WriteLine("Encoding: {0}", i);
        Console.WriteLine(System.Text.Encoding.GetEncoding(i).EncodingName);
        Console.WriteLine();
    }
    catch
    {
    }
}

Upvotes: 1

Trying to detect encoding of Russian text-and read as string

Answers (3)

Related Questions