Reputation: 163
I have a problem, when I trying to load file from filesystem. Issue that in value of some HTML control I have less than sign "<" inside span value
HtmlDocument doc = new HtmlDocument();
doc.OptionReadEncoding = true;
//StreamReader str = new StreamReader(fileName, Encoding.UTF8);
StreamReader str = new StreamReader(@"E:\HTMLS\OEL\1030,1.html",Encoding.UTF8,true);
doc.Load(str.BaseStream, Encoding.ASCII);
//string streamString = str.ReadToEnd().
str.Close();
//all nodes
doc.DocumentNode.Descendants().Where(x => x.Name == "#text" && (x.InnerText == "\r\n\t" || x.InnerText == "\r\n" || x.InnerText == "\r\n\t\t")).ToList().ForEach(x => x.Remove());
List<HtmlNode> listHtmlNode = doc.DocumentNode.Descendants("table").ToList();
Upvotes: 1
Views: 1707
Reputation: 3114
You shouldn't have symbols such as <
as content in your HTML. Having them in your html makes the html invalid and will cause the HTMLAgility pack to not perform correctly.
If you need them in your html you need to encode them. <
becomes %lt;
see here http://www.w3schools.com/html/html_entities.asp
Upvotes: 2