Reputation: 378
When I am using html agility, I set the encoding to UTF-8. It works well when reading some texts, but in some cases it returns texts similar to the following text.
۱۳۹۹-۱۱-۲۰ ۲۳:۲۷
My code is almost as follows:
HtmlWeb web2 = new HtmlWeb();
web2.AutoDetectEncoding = false;
web2.OverrideEncoding = Encoding.UTF-8;
var doc = await this.web2.LoadFromWebAsync(url);
date = doc.DocumentNode
.SelectNodes("/html/body/div[2]/main/div[2]/div[2]/div[1]/div[1]/div[2]/span[1]")
.First().InnerText;
I should add that it had the same problem without encoding.
Does anyone know where the problem is?
Upvotes: 0
Views: 50
Reputation: 76
These are HTML entities representing the original text. If this is inside a web application you can use HttpUtility.HtmlDecode
from the System.Net
namespace. If this is outside of a webapplication you can use WebUtility.HtmlDecode
, also from the System.Net
namespace. This will change the HTML entities back into the corresponding text.
Running it through a fiddle resulted in
۱۳۹۹-۱۱-۲۰ ۲۳:۲۷
https://dotnetfiddle.net/J7YXZM
using System;
using System.Net;
public class Program
{
public static void Main()
{
var encoded = "۱۳۹۹-۱۱-۲۰ ۲۳:۲۷";
var decoded = WebUtility.HtmlDecode(encoded);
Console.WriteLine(decoded);
}
}
Upvotes: 1
Reputation: 81583
Use HtmlDecode
Converts a string that has been HTML-encoded for HTTP transmission into a decoded string.
To encode or decode values outside of a web application, use the
WebUtility
class.
Example
var asd = HttpUtility.HtmlDecode("۱۳۹۹-۱۱-۲۰ ۲۳:۲۷");
Console.WriteLine(asd);
Output
۱۳۹۹-۱۱-۲۰ ۲۳:۲۷
Upvotes: 2