mahdi aghasi
mahdi aghasi

Reputation: 378

html agility Returns incomprehensible text

When I am using html agility, I set the encoding to UTF-8. It works well when reading some texts, but in some cases it returns texts similar to the following text.

۱۳۹۹-۱۱-۲۰ ۲۳:۲۷

My code is almost as follows:

 HtmlWeb web2 = new HtmlWeb();
 web2.AutoDetectEncoding = false;
 web2.OverrideEncoding = Encoding.UTF-8;
 var doc = await this.web2.LoadFromWebAsync(url);
  date = doc.DocumentNode
                    .SelectNodes("/html/body/div[2]/main/div[2]/div[2]/div[1]/div[1]/div[2]/span[1]")
                    .First().InnerText;

I should add that it had the same problem without encoding.

Does anyone know where the problem is?

Upvotes: 0

Views: 50

Answers (2)

Wesley
Wesley

Reputation: 76

These are HTML entities representing the original text. If this is inside a web application you can use HttpUtility.HtmlDecode from the System.Net namespace. If this is outside of a webapplication you can use WebUtility.HtmlDecode, also from the System.Net namespace. This will change the HTML entities back into the corresponding text.

Running it through a fiddle resulted in

۱۳۹۹-۱۱-۲۰ ۲۳:۲۷

https://dotnetfiddle.net/J7YXZM

using System;
using System.Net;

public class Program
{
    public static void Main()
    {
        var encoded = "۱۳۹۹-۱۱-۲۰ ۲۳:۲۷";
        var decoded = WebUtility.HtmlDecode(encoded);
        Console.WriteLine(decoded);
    }
}

Upvotes: 1

TheGeneral
TheGeneral

Reputation: 81583

Use HtmlDecode

Converts a string that has been HTML-encoded for HTTP transmission into a decoded string.

To encode or decode values outside of a web application, use the WebUtility class.

Example

var asd = HttpUtility.HtmlDecode("۱۳۹۹-۱۱-۲۰ ۲۳:۲۷");
Console.WriteLine(asd);

Output

۱۳۹۹-۱۱-۲۰ ۲۳:۲۷

Full Demo Here

Upvotes: 2

Related Questions