Desolator
Desolator

Reputation: 22759

C# - converting UTF-8 to Ukranian encoding

I was trying to convert the encoding of this string from utf-8 to ukranian "ÐÑайвеÑ-длÑ-пÑинÑеÑа-Pixma-ip-2000-длÑ-Windows-7-64-биÑ". whenever I convert it from utf8 to ukranian I get a corrupted string...

the correct string should look like "Драйвер-для-принтера-Pixma-ip-2000-для-Windows-7-64-бит"..

please advice.. thanks

EDIT: here is how I convert it..

private string EncodeUTF8toOther(string inputString, string to)
        {
            try
            {  
                // Create two different encodings.
                byte[] myBytes = Encoding.Unicode.GetBytes(inputString);

                // Perform the conversion from one encoding to the other.            
                byte[] convertedBytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(to), myBytes);

                return Encoding.GetEncoding("ISO-8859-1").GetString(convertedBytes);

            }
            catch
            {
                return inputString;
            }
        }

ukrainian character set is "KOI8-U"

More Info: I have similar problem to this question: c# HttpWebResponse Header encoding

the location header is giving me this corrupted string. I need to encode it correctly in order to perform the redirection..

Upvotes: 1

Views: 3157

Answers (3)

Mike
Mike

Reputation: 53

"ÐÑайвеÑ-длÑ-пÑинÑеÑа-Pixma-ip-2000-длÑ-Windows-7-64-биÑ".

Its already UTF-8! You don't have to make any conversion. Just make Windows know its UTF-8. Something like this will do the job:

wb.Encoding = Encoding.UTF8;

Upvotes: 0

Lumi
Lumi

Reputation: 15294

You need to decode the string properly on input, like so:

 StreamReader rdr = new StreamReader( args[0], Encoding.UTF8 );
 string str = rdr.ReadToEnd();
 rdr.Close();

The stream is physical and you must know what encoding it is in.

The string, on the other hand, is logical. The encoding used for strings internally is of no concern to you; other than that what characters it can represent; and it can represent all characters as the internal encoding is for Unicode. (If the internal encoding were KOI-8 German or French characters couldn't be represented.)

It is on output that you have to worry again about the encoding.

If you don't specify the encoding on input and output the platform default is assumed. This might not be what you want. It's good practice to know and specify the encoding on input and output.

Upvotes: 0

Kevin Gosse
Kevin Gosse

Reputation: 39007

Encoding.Unicode is UTF-16, not UTF-8. If you're sure your source string is encoded in UTF-8, use Encoding.UTF8 instead.

And returning a string doesn't have any sense. string are always encoded in UTF-16. You should worry about the encoding only when reading and writing your string.

When reading, use Encoding.UTF8.GetString to create a UTF-16 string from the binary data.

When writing, either use Encoding.GetEncoding(destinationEncoding).GetBytes to get the binary data and write it directly, or use the overload of your StreamWriter constructor (or whatever object you're using) to specify the encoding.

Upvotes: 1

Related Questions