VirtuoZ
VirtuoZ

Reputation: 163

HtmlAgilityPack file loading

I have a problem, when I trying to load file from filesystem. Issue that in value of some HTML control I have less than sign "<" inside span value

HtmlDocument doc = new HtmlDocument();
doc.OptionReadEncoding = true;

//StreamReader str = new StreamReader(fileName, Encoding.UTF8);
StreamReader str = new StreamReader(@"E:\HTMLS\OEL\1030,1.html",Encoding.UTF8,true);

doc.Load(str.BaseStream, Encoding.ASCII);
//string streamString = str.ReadToEnd().
str.Close();
//all nodes

doc.DocumentNode.Descendants().Where(x => x.Name == "#text" && (x.InnerText == "\r\n\t" || x.InnerText == "\r\n" || x.InnerText == "\r\n\t\t")).ToList().ForEach(x => x.Remove());
List<HtmlNode> listHtmlNode = doc.DocumentNode.Descendants("table").ToList();

Upvotes: 1

Views: 1707

Answers (1)

CeejeeB
CeejeeB

Reputation: 3114

You shouldn't have symbols such as < as content in your HTML. Having them in your html makes the html invalid and will cause the HTMLAgility pack to not perform correctly.

If you need them in your html you need to encode them. < becomes %lt; see here http://www.w3schools.com/html/html_entities.asp

Upvotes: 2

Related Questions