gingray
gingray

Reputation: 401

Troubles with HtmlAgilityPack

I can't figure out what goes wrong. i just create the poject to test HtmlAgilityPack and what i've got.

using System;
using System.Collections.Generic;
using System.Text;
using HtmlAgilityPack;


namespace parseHabra
{
    class Program
    {
        static void Main(string[] args)
        {
            HTTP net = new HTTP(); //some http wraper
            string result = net.MakeRequest("http://stackoverflow.com/", null);
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(result);

            //Get all summary blocks
            HtmlNodeCollection news = doc.DocumentNode.SelectNodes("//div[@class=\"summary\"]");
            foreach (HtmlNode item in news)
            {
                string title = String.Empty;
                //trouble is here for each element item i get the same value
                //all the time
                title = item.SelectSingleNode("//a[@class=\"question-hyperlink\"]").InnerText.Trim();
                Console.WriteLine(title);
            }
            Console.ReadLine();
        }
    }
}

It looks like i make xpath not for each node i've selected but to whole document. Any suggestions why it so ? Thx in advance.

Upvotes: 0

Views: 279

Answers (2)

Jeff Mercado
Jeff Mercado

Reputation: 134801

I'd rewrite your xpath as a single query to find all the question titles, rather than finding the summaries then the titles. Chris' answer points out the problem which could have easily been avoided.

var web = new HtmlWeb();
var doc = web.Load("http://stackoverflow.com");

var xpath = "//div[starts-with(@id,'question-summary-')]//a[@class='question-hyperlink']";

var questionTitles = doc.DocumentNode
    .SelectNodes(xpath)
    .Select(a => a.InnerText.Trim());

Upvotes: 1

Chris Taylor
Chris Taylor

Reputation: 53699

I have not tried your code, but from the quick look I suspect the problem is that the // is searching from the root of the entire document and not the root of the current element as I guess you are expecting.

Try putting a . before the //

".//a[@class=\"question-hyperlink\"]"

Upvotes: 2

Related Questions