htmlagilitypack xpath incorrect

Question

I have a problem that my xpath is not working.

I am trying to get the url from Google.com's search result list into a string list.

But i am unable to reach on url using Xpath.

Please help me in correcting my xpath. Also tell me what should be on the place of ??

HtmlWeb hw = new HtmlWeb();
List urls = new List();
HtmlAgilityPack.HtmlDocument doc = hw.Load("http://www.google.com/search?q=" +txtURL.Text.Replace(" " , "+"));
HtmlNodeCollection linkNodes = doc.DocumentNode.SelectNodes("//div[@class='f kv']");
foreach (HtmlNode linkNode in linkNodes)
{
    HtmlAttribute link = linkNode.Attributes["?????????"];
    urls.Add(link.Value);

}
for (int i = 0; i <= urls.Count - 1; i++)
{
    if (urls.ElementAt(i) != null)
    {
        if (IsValid(urls.ElementAt(i)) != true)
        {
            grid.Rows.Add(urls.ElementAt(i));

        }
    }
}

Cristian Lupascu · Accepted Answer

The correct XPath is "//div[@class='kv']/cite". The f class you see in the browser element inspector is (probably) added after the page is rendered using javascript.

Also, the link text is not in an attribute, you can get it using the InnerText property of the

element(s) obtained at the earlier step.

I changed these lines and it works:

var linkNodes = doc.DocumentNode.SelectNodes("//div[@class='kv']/cite");

foreach (HtmlNode linkNode in linkNodes)
{
    urls.Add(linkNode.InnerText);
}

There's a caveat though: some links are trimmed (you'll see a ... in the middle)

htmlagilitypack xpath incorrect

Answers (2)

Related Questions