user1444921
user1444921

Reputation: 11

Getting text from elements without id or class name

I am trying to parse HTML code using Html Agility Pack. Is there any tutorial available, or can someone tell me how can I get a text from a <td> that has no Id and no class?

    <table id="results-table">
    <tr class="row1">
    <td>Diode Zener Single 12V 5% 1W 2-Pin DO-41 Bulk</td> 
    ...

Each row contains 10 different <td>. Thanks!

Upvotes: 0

Views: 2378

Answers (3)

Tan Woods
Tan Woods

Reputation: 1

I guess some of your td tags will have class/id. Use the following code. I wrote that in linqpad

void Main()
{
    var webGet = new HtmlAgilityPack.HtmlDocument();
    //web page/string that need to be parsed
    webGet.LoadHtml(@"<table id='results-table'>" +
                                "<tr class='row1'>" + 
                                "<td class='testclass'>test td with class</td>" + 
                                "<td id='testid'>test td with id</td>" + 
                                "<td>Diode Zener Single 12V 5% 1W 2-Pin DO-41 Bulk</td>" + 
                                "<td>test td without class or id</td>" + 
                                "<tr/>"
                                );

    var tableOnPage = (from tds in webGet.DocumentNode.Descendants()
                      where lnks.Name == "td" &&
                            lnks.Attributes["class"] == null && tds.Attributes["id"] == null &&
                            tds.ParentNode.InnerText.Trim().Length > 0 && lnks.InnerText.Trim().Length > 0 
                     select new
                     {
                         td = tds.DescendantNodes().SingleOrDefault ().InnerHtml.Trim(),
                     });

    //looping through each items
    foreach (var item in tableOnPage)
    {
        Console.WriteLine(item.td);
    }
}

Output will be

Diode Zener Single 12V 5% 1W 2-Pin DO-41 Bulk

test td without class or id

Upvotes: 0

Chani Poz
Chani Poz

Reputation: 1463

Here is a link that explain how to use XPath:

http://www.w3schools.com/xpath/

Upvotes: 2

Anil Vangari
Anil Vangari

Reputation: 570

You can try using this XPATH to query all the tds within your table having id="results-table"

//table[@id='results-table']/tr/td

Firepath for Firefox can help you in formulating XPATH and you can manipulate it from there.

Sample code below

HtmlDocument doc = new HtmlDocument();
var fileName = @"..\..\..\docs\10960189.htm";
doc.Load(fileName);

var nodes = doc.DocumentNode.SelectNodes("//table[@id='results-table']/tr/td");

foreach (var node in nodes)
{
    Debug.WriteLine(node.InnerText);
}

HTH

Upvotes: 3

Related Questions