Reputation: 299
I'm getting html this way:
using (var wb = new WebClient())
{
data = soeArray;
var response = wb.UploadValues(url, "POST", data);
string result = System.Text.Encoding.UTF8.GetString(response);
}
But there are unicode codes like ś
in response. Is there any method which i can use to change this to corresponding char?
Upvotes: 3
Views: 447
Reputation: 111
This is not as simple as you would maybe think. The codes you are being returned are decimal Unicode code points. For these, you can just convert the codes to hexadecimal and precede them with the \u character.
int decCode = int.Parse(rawCode.Substring(2));
string hexCode = decCode.ToString("X");
char c = Char.Parse("\u" + hexCode);
Easy right? Wrong. Unicode characters in HTML can also be represented as hex codes if they precede the code woth ODE (eg. — represents \u2014).
Easy enough, we just add the logic if the code has an 'x' in front of it, parse it as hex, right?
rawCode = rawCode.Substring(2);
if (rawCode[0] == 'x') {
hexCode = int.Parse(rawCode.Substring(1));
} else {
int decCode = int.Parse(rawCode);
hexCode = decCode.ToString("X");
}
char c = Char.Parse("\u" + hexCode);
Seems simple? Nope. HTML Unicode can also be represented by the "EntityName" of the character. (eg. " or ©).
Leave it to an HTML decoder and all you need to do is something like this.
string s = System.Net.WebUtility.HtmlDecode("©"); // returns ©
Upvotes: 1
Reputation: 44971
I think what you are looking for is System.Web.HttpUtility.HtmlDecode or, if this is not a web application, System.Net.WebUtility.HtmlDecode.
For example:
string result = System.Net.WebUtility.HtmlDecode(System.Text.Encoding.UTF8.GetString(response));
Upvotes: 6