Reputation: 1054
I have the following method (i'm using the htmlagilitypack):
public DataTable tableIntoTable(HtmlDocument doc)
{
var nodes = doc.DocumentNode.SelectNodes("//table");
var table = new DataTable("MyTable");
table.Columns.Add("raw", typeof(string));
foreach (var node in nodes)
{
if (
(!node.InnerHtml.Contains("pldefault"))
&& (!node.InnerHtml.Contains("ntdefault"))
&& (!node.InnerHtml.Contains("bgtabon"))
)
{
table.Rows.Add(node.InnerHtml);
}
}
return table;
}
It accepts html grabbed using this:
public HtmlDocument getDataWithGet(string url)
{
using (var wb = new WebClient())
{
string response = wb.DownloadString(url);
var doc = new HtmlDocument();
doc.LoadHtml(response);
return doc;
}
}
All works fine with an html document that is 3294 lines long.
When I feed it some html that is 33960 lines long I get:
StackOverflowException was unhandled at the IF statement in the tableIntoTable method as seen in this image:
https://i.sstatic.net/rgZqd.jpg
I thought it might be related to the MaxHttpCollectionKeys limit of 1000 so I tried putting this in my Web.config and it still doesn't work: add key="aspnet:MaxHttpCollectionKeys" value="9999"
I'm not really sure where to go from here, it only breaks with larger html documents.
Upvotes: 1
Views: 165
Reputation: 2372
Assuming the values in your if statement are contained in some attribute value of some decendant of a table.
var xpath = @"//table[not(.//*[contains(@*,'pldefault') or
contains(@*,'ntdefault') or
contains(@*,'bgtabon')])]";
var tables = doc.DocumentNode.SelectNodes(xpath);
Upadte: More accurately based on your comments:
@"//table[not(.//td[contains(@class,'pldefault') or
contains(@class,'ntdefault') or
contains(@class,'bgtabon')])]";
Upvotes: 1