Jrow
Jrow

Reputation: 1054

MVC StackOverflowException with larger html data

I have the following method (i'm using the htmlagilitypack):

public DataTable tableIntoTable(HtmlDocument doc)
    {
        var nodes = doc.DocumentNode.SelectNodes("//table");
        var table = new DataTable("MyTable");
        table.Columns.Add("raw", typeof(string));                       

        foreach (var node in nodes)
        {
            if (
                (!node.InnerHtml.Contains("pldefault"))
                && (!node.InnerHtml.Contains("ntdefault"))
                && (!node.InnerHtml.Contains("bgtabon"))                
                )
            {
                table.Rows.Add(node.InnerHtml);
            }
        }
        return table;
    }

It accepts html grabbed using this:

 public HtmlDocument getDataWithGet(string url)
    {
        using (var wb = new WebClient())
        {
            string response = wb.DownloadString(url);
            var doc = new HtmlDocument();
            doc.LoadHtml(response);
            return doc;
        }
    }

All works fine with an html document that is 3294 lines long.
When I feed it some html that is 33960 lines long I get: StackOverflowException was unhandled at the IF statement in the tableIntoTable method as seen in this image: https://i.sstatic.net/rgZqd.jpg

I thought it might be related to the MaxHttpCollectionKeys limit of 1000 so I tried putting this in my Web.config and it still doesn't work: add key="aspnet:MaxHttpCollectionKeys" value="9999"

I'm not really sure where to go from here, it only breaks with larger html documents.

Upvotes: 1

Views: 165

Answers (1)

Xi Sigma
Xi Sigma

Reputation: 2372

Assuming the values in your if statement are contained in some attribute value of some decendant of a table.

var xpath = @"//table[not(.//*[contains(@*,'pldefault') or
                               contains(@*,'ntdefault') or 
                               contains(@*,'bgtabon')])]";

var tables = doc.DocumentNode.SelectNodes(xpath);

Upadte: More accurately based on your comments:

        @"//table[not(.//td[contains(@class,'pldefault') or
                            contains(@class,'ntdefault') or 
                            contains(@class,'bgtabon')])]";

Upvotes: 1

Related Questions