Suresh Sharma
Suresh Sharma

Reputation: 57

htmlagilitypack xpath not working

I have a problem that my xpath is not working.

I am trying to get the url of Google.com's next link at the bottom.

But i am unable to reach on url using Xpath.

Please help me in correcting my xpath. Also tell me what should be on the place of ??

HtmlWeb hw = new HtmlWeb();

HtmlAgilityPack.HtmlDocument doc = hw.Load("http://www.google.com/search?q=seo");
HtmlNodeCollection linkNodes = doc.DocumentNode.SelectNodes("//*[@id='pnnext']");

foreach (HtmlNode linkNode in linkNodes)
{
    HtmlAttribute link = linkNode.Attributes["href"];
    MessageBox.Show(link.Value );
}

Upvotes: 3

Views: 2847

Answers (1)

Cristian Lupascu
Cristian Lupascu

Reputation: 40576

The weird thing here is that somehow HtmlAgilityPack does not recognize the id attribute of the "Next" link.

This could be a bug in HtmlAgilityPack; you can post it in the HAP Issue Tracker.

However, in the meantime I found this workaround:

  • find the table that contains the paging elements (the table with id="nav"). For this element the id is correctly recognized
  • take the first (and only tr) in the table and the last td of it (using the XPath last() function)
  • take the a element inside the td we obtained at the previous step.

Long story short, here's the code:

var doc = new HtmlWeb().Load("http://www.google.com/search?q=seo");

var nextLink = doc.DocumentNode
    .SelectSingleNode("//table[@id='nav']/tr/td[last()]/a");

Console.WriteLine(nextLink.GetAttribute("href", "err"));

Update

After Simon's comment I checked this again and the conclusion is that this is not a bug in HTML Agility Pack; the id="pnnext" attribute is only present when the request is made by a browser (perhaps depending on the UserAgent header value). When doing an HttpWebRequest from code, this is how the "Next" link appears in the output:

<a href="/search?q=seo&amp;hl=en&amp;ie=UTF-8&amp[...]" style="text-align:left">

Upvotes: 4

Related Questions