Justyn
Justyn

Reputation: 85

How to parse html table (from file) by specific ID

I am trying to get specific table (by id) from downloaded html and parse it I´ve tried few ways and my last code is

            var url = @"C:\Users\name\Plocha\web.html";

        var doc = new HtmlDocument();

        doc.Load(url);

        string data = "";
        int i = 2;
        foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
        {
            Console.WriteLine($"Found: {table.Id}");
            if (table.Id == "formTbl")
            {
                foreach (HtmlNode row in table.SelectNodes("//tr"))
                {
                    foreach (HtmlNode cell in row.SelectNodes("td"))
                    {
                        if (i == 1)
                        {
                            data += $"Column:{cell.InnerText}";
                            i = 2;
                        }
                        else if (i == 2)
                        {
                            data += $"Row: {cell.InnerText}";
                            Console.WriteLine(data);
                            data = "";
                            i = 1;
                        }
                        
                        
                            
                    }
                    
                    
                }
            }
            else
            {
                Console.WriteLine("Not what we want");
            }



        }

The problem is that it print all tables from webpage even tho I have specified to continue only if id = formTbl.

How data looks on table (theres no Name of columns its just two rows, in first row is name of column and in second row is value) Table

Upvotes: 1

Views: 1147

Answers (2)

Justyn
Justyn

Reputation: 85

foreach (var table in doc.DocumentNode.SelectNodes("//table[@id='formTbl']"))
{
    foreach (var row in table.SelectNodes("tbody/tr"))
    {
        Console.WriteLine(row.Id);
        foreach (var cell in row.SelectNodes("td"))
        {
            Console.WriteLine(cell.InnerText);
        }
    }
}

Problem was that I hasn't used tbody/tr

Thanks to @NPras

Upvotes: 0

NPras
NPras

Reputation: 4115

SelectNodes() takes an XPath query. Some useful examples here. A particular one that is relevant to your case is this: //book - Selects all book elements no matter where they are in the document.

This means that instead of using "//tr" (searches the whole doc), you should look for "tr" if you want to respect the scope.

You could even use xpath to do the id searching AND selecting the <tr> underneath, using a single query:

foreach (var row in doc.DocumentNode.SelectNodes("//table[@id='formTbl']/tr"))
{
    // ...do <tr> stuff
    foreach (var cell in row.SelectNodes("td"))
    {
        // ... do <td> stuff
    }
}

Upvotes: 1

Related Questions