Reputation: 71
I want to read the table shown in this link.
When I tried to do with HtmlAgilityPack
, I am getting null
var nodes = document.DocumentNode.SelectNodes("//table[contains(@class, 'table')]");
Can you please let me know what is the issue ? Am I doing it in wrong way?
Upvotes: 0
Views: 760
Reputation: 2466
There is nothing wrong with your xpath. I am just gonna assume that you don't know how to get the data out of the table. You need to look up xpaths.
public static void Main(string[] args)
{
HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://www.manualslib.com/brand/A.html");
request.Method = "GET";
request.ContentType = "text/html;charset=utf-8";
using (var response = (HttpWebResponse)request.GetResponse())
{
using (var stream = response.GetResponseStream())
{
doc.Load(stream, Encoding.GetEncoding("utf-8"));
}
}
}
catch (WebException ex)
{
Console.WriteLine(ex.Message);
}
//Works fine
HtmlNode tablebody = doc.DocumentNode.SelectSingleNode("//table[contains(@class, 'table')]/tbody");
foreach(HtmlNode tr in tablebody.SelectNodes("./tr"))
{
Console.WriteLine("\nTableRow: ");
foreach(HtmlNode td in tr.SelectNodes("./td"))
{
if (td.GetAttributeValue("class", "null") == "col1")
{
Console.Write("\t " + td.InnerText);
}
else
{
HtmlNode temp = td.SelectSingleNode(".//div[@class='catel']/a");
if (temp != null)
{
Console.Write("\t " + temp.GetAttributeValue("href", "no url"));
}
}
}
}
Console.ReadKey();
}
First we go into the node, tbody with the xpath, but only if the attribute in the class in the table contains 'table':
//table[contains(@class, 'table')]/tbody
Now we select all the nodes called tr(table row):
./tr
The dot here means that from the current context we're in we go going to find all the tr-nodes. Then in each tr-node we are going to find all the td-nodes with:
./td
Now in each table cell we want to get the data. In the first td we know the class-attribute is equal to 'col1'. So if the td contains a class with that value - we then want to get the text inside that td-node.
If however it doesn't contain that attribute we know that we want the anchor-tag that is inside a div that has a class-attribute with the value 'catel'.
Inside that anchor-tag we want to get the value of the href-attribute.
Upvotes: 1
Reputation: 129
Use this way :
document.DocumentNode.SelectNodes("//div[@class='col-sm-8']/table[contains(@class, 'table')]/tbody/tr")
Upvotes: 0