user1838937
user1838937

Reputation: 299

How to change unicode code to char

I'm getting html this way:

using (var wb = new WebClient())
{
    data = soeArray;
    var response = wb.UploadValues(url, "POST", data);
    string result = System.Text.Encoding.UTF8.GetString(response);
}

But there are unicode codes like ś in response. Is there any method which i can use to change this to corresponding char?

Upvotes: 3

Views: 447

Answers (2)

Windwaker
Windwaker

Reputation: 111

This is not as simple as you would maybe think. The codes you are being returned are decimal Unicode code points. For these, you can just convert the codes to hexadecimal and precede them with the \u character.

int decCode = int.Parse(rawCode.Substring(2));
string hexCode = decCode.ToString("X");
char c = Char.Parse("\u" + hexCode);

Easy right? Wrong. Unicode characters in HTML can also be represented as hex codes if they precede the code woth &#xCODE (eg. &#x2014 represents \u2014).

Easy enough, we just add the logic if the code has an 'x' in front of it, parse it as hex, right?

rawCode = rawCode.Substring(2);
if (rawCode[0] == 'x') {
    hexCode = int.Parse(rawCode.Substring(1));
} else {
    int decCode = int.Parse(rawCode);
    hexCode = decCode.ToString("X");
}
char c = Char.Parse("\u" + hexCode);

Seems simple? Nope. HTML Unicode can also be represented by the "EntityName" of the character. (eg. " or ©).

You do not want to touch this code.

Leave it to an HTML decoder and all you need to do is something like this.

string s =  System.Net.WebUtility.HtmlDecode("©"); // returns ©

Upvotes: 1

competent_tech
competent_tech

Reputation: 44971

I think what you are looking for is System.Web.HttpUtility.HtmlDecode or, if this is not a web application, System.Net.WebUtility.HtmlDecode.

For example:

string result = System.Net.WebUtility.HtmlDecode(System.Text.Encoding.UTF8.GetString(response));

Upvotes: 6

Related Questions