Yip Weng Tak
Yip Weng Tak

Reputation: 15

Converting a string encoded in utf8 to unicode in C#

I've got this string returned via HTTP Post from a URL in a C# application, that contains some chinese character eg:

Gelatos® Colors Gift Set中文

Problem is I want to convert it to

Gelatos® Colors Gift Set中文

Both string are actually identical but encoded differently. I understand in C# everything is UTF16. I've tried reading alof of postings here regarding converting from one encoding to the other but no luck.

Hope someone could help.

Here's the C# code:

WebClient wc = new WebClient();
json = wc.DownloadString("http://mysite.com/ext/export.asp");

textBox2.Text = "Receiving orders....";

//convert the string to UTF16
        Encoding ascii = Encoding.ASCII;
        Encoding unicode = Encoding.Unicode;
        Encoding utf8 = Encoding.UTF8;

        byte[] asciiBytes = ascii.GetBytes(json);
        byte[] utf8Bytes = utf8.GetBytes(json);
        byte[] unicodeBytes = Encoding.Convert(utf8, unicode, utf8Bytes);

        string sOut = unicode.GetString(unicodeBytes);

System.Windows.Forms.MessageBox.Show(sOut);  //doesn't work...

Here's the code from the server:

<%@CodePage = 65001%>
<%option explicit%>
<%
Session.CodePage = 65001
Response.charset ="utf-8"
Session.LCID     = 1033 'en-US

..... response.write (strJSON)

%>

The output from the web is correct. But I was just wondering if some changes is done on the http stream to the C# application.

thanks.

Upvotes: 0

Views: 4367

Answers (2)

Douglas
Douglas

Reputation: 54897

If the server is really returning UTF-8 text, you can configure your WebClient by setting its Encoding property. This would eliminate any need for subsequent conversions.

using (WebClient wc = new WebClient())
{
    wc.Encoding = Encoding.UTF8;
    json = wc.DownloadString("http://mysite.com/ext/export.asp");
}

Upvotes: 0

usr
usr

Reputation: 171236

Download the web pages as bytes in the first place. Then, convert the bytes to the correct encoding.

By first converting it using a wrong encoding you are probably losing data. Especially using ASCII.

Upvotes: 1

Related Questions