How to parse html table (from file) by specific ID

Question

I am trying to get specific table (by id) from downloaded html and parse it I´ve tried few ways and my last code is

            var url = @"C:\Users
ame\Plocha\web.html";

        var doc = new HtmlDocument();

        doc.Load(url);

        string data = "";
        int i = 2;
        foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
        {
            Console.WriteLine($"Found: {table.Id}");
            if (table.Id == "formTbl")
            {
                foreach (HtmlNode row in table.SelectNodes("//tr"))
                {
                    foreach (HtmlNode cell in row.SelectNodes("td"))
                    {
                        if (i == 1)
                        {
                            data += $"Column:{cell.InnerText}";
                            i = 2;
                        }
                        else if (i == 2)
                        {
                            data += $"Row: {cell.InnerText}";
                            Console.WriteLine(data);
                            data = "";
                            i = 1;
                        }
                        
                        
                            
                    }
                    
                    
                }
            }
            else
            {
                Console.WriteLine("Not what we want");
            }



        }

The problem is that it print all tables from webpage even tho I have specified to continue only if id = formTbl.

How data looks on table (theres no Name of columns its just two rows, in first row is name of column and in second row is value)

NPras · Accepted Answer

SelectNodes() takes an XPath query. Some useful examples here. A particular one that is relevant to your case is this: //book - Selects all book elements no matter where they are in the document.

This means that instead of using "//tr" (searches the whole doc), you should look for "tr" if you want to respect the scope.

You could even use xpath to do the id searching AND selecting the underneath, using a single query:

foreach (var row in doc.DocumentNode.SelectNodes("//table[@id='formTbl']/tr"))
{
    // ...do  stuff
    foreach (var cell in row.SelectNodes("td"))
    {
        // ... do  stuff
    }
}

How to parse html table (from file) by specific ID

Answers (2)

Related Questions