Reputation: 11
I am trying to parse HTML code using Html Agility Pack. Is there any tutorial available, or can someone tell me how can I get a text from a <td>
that has no Id and no class?
<table id="results-table">
<tr class="row1">
<td>Diode Zener Single 12V 5% 1W 2-Pin DO-41 Bulk</td>
...
Each row contains 10 different <td>
. Thanks!
Upvotes: 0
Views: 2378
Reputation: 1
I guess some of your td tags will have class/id. Use the following code. I wrote that in linqpad
void Main()
{
var webGet = new HtmlAgilityPack.HtmlDocument();
//web page/string that need to be parsed
webGet.LoadHtml(@"<table id='results-table'>" +
"<tr class='row1'>" +
"<td class='testclass'>test td with class</td>" +
"<td id='testid'>test td with id</td>" +
"<td>Diode Zener Single 12V 5% 1W 2-Pin DO-41 Bulk</td>" +
"<td>test td without class or id</td>" +
"<tr/>"
);
var tableOnPage = (from tds in webGet.DocumentNode.Descendants()
where lnks.Name == "td" &&
lnks.Attributes["class"] == null && tds.Attributes["id"] == null &&
tds.ParentNode.InnerText.Trim().Length > 0 && lnks.InnerText.Trim().Length > 0
select new
{
td = tds.DescendantNodes().SingleOrDefault ().InnerHtml.Trim(),
});
//looping through each items
foreach (var item in tableOnPage)
{
Console.WriteLine(item.td);
}
}
Output will be
Diode Zener Single 12V 5% 1W 2-Pin DO-41 Bulk
test td without class or id
Upvotes: 0
Reputation: 1463
Here is a link that explain how to use XPath:
http://www.w3schools.com/xpath/
Upvotes: 2
Reputation: 570
You can try using this XPATH
to query all the td
s within your table
having id="results-table"
//table[@id='results-table']/tr/td
Firepath for Firefox can help you in formulating XPATH and you can manipulate it from there.
Sample code below
HtmlDocument doc = new HtmlDocument();
var fileName = @"..\..\..\docs\10960189.htm";
doc.Load(fileName);
var nodes = doc.DocumentNode.SelectNodes("//table[@id='results-table']/tr/td");
foreach (var node in nodes)
{
Debug.WriteLine(node.InnerText);
}
HTH
Upvotes: 3