Reputation: 57
I have a problem that my xpath is not working.
I am trying to get the url of Google.com's next link at the bottom.
But i am unable to reach on url using Xpath.
Please help me in correcting my xpath. Also tell me what should be on the place of ??
HtmlWeb hw = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = hw.Load("http://www.google.com/search?q=seo");
HtmlNodeCollection linkNodes = doc.DocumentNode.SelectNodes("//*[@id='pnnext']");
foreach (HtmlNode linkNode in linkNodes)
{
HtmlAttribute link = linkNode.Attributes["href"];
MessageBox.Show(link.Value );
}
Upvotes: 3
Views: 2847
Reputation: 40576
The weird thing here is that somehow HtmlAgilityPack does not recognize the id
attribute of the "Next" link.
This could be a bug in HtmlAgilityPack; you can post it in the HAP Issue Tracker.
However, in the meantime I found this workaround:
id="nav"
). For this element the id is correctly recognizedtr
) in the table and the last td
of it (using the XPath last()
function)a
element inside the td
we obtained at the previous step.Long story short, here's the code:
var doc = new HtmlWeb().Load("http://www.google.com/search?q=seo");
var nextLink = doc.DocumentNode
.SelectSingleNode("//table[@id='nav']/tr/td[last()]/a");
Console.WriteLine(nextLink.GetAttribute("href", "err"));
After Simon's comment I checked this again and the conclusion is that this is not a bug in HTML Agility Pack; the id="pnnext"
attribute is only present when the request is made by a browser (perhaps depending on the UserAgent header value). When doing an HttpWebRequest
from code, this is how the "Next" link appears in the output:
<a href="/search?q=seo&hl=en&ie=UTF-8&[...]" style="text-align:left">
Upvotes: 4