Html agility xpath get following node if

Question

I have an html document structured as:

I want to retrieve, in a separate List, all the nodes until the next section, so until the next

.

For now I'm using:

for (int paragraph = xx; paragraph <= yy; paragraph++)
{
       nameActual = "sect" + paragraph;
       nameNext = "sect" + (paragraph + 1);
       HtmlNodeCollection NodeOfParagraph = doc.DocumentNode.SelectNodes(String.Format("//h3[a[@name='{0}']]/following-sibling::p[following::h3/a[@name='{1}']]", nameActual, nameNext));

      //Multiples actions on my NodeOfParagraph
}

So I select my first

that possesses an of the value I'm looking for, and I then select all the
nodes that possess a following node with an of my next value.

It works, but takes a really long time, I suppose because for each node it tests all the other node for their value.

How can I improve my query performances ?

Keith Hall · Accepted Answer

You could do the following:

Find all the section definitions and store them in a list
Loop through the section definitions
- and get all the nodes between this section and the next one (or the end of the document if there are no more section definitions) by specifying the exact name of the next section in the query

var doc = new HtmlDocument();
doc.Load(@"path	o\file.html");
var sects = doc.DocumentNode.SelectNodes("//h3[a[starts-with(@name, 'sect')]]");

for (var index = 0; index < sects.Count; index ++)
{
    var isLast = (index == sects.Count - 1);
    var xpath = ".//following-sibling::p";
    if (!isLast)
        xpath += string.Format("[following-sibling::h3[1][a/@name = '{0}']]", sects[index + 1].SelectSingleNode("./a").Attributes["name"].Value);
    var collection = sects[index].SelectNodes(xpath);

}

This will have the advantage of:

not trying to find a section number that doesn't exist
using the context node (starting the query with ./) so that unnecessary parts of the document are not searched
stop at the next h3 (h3[1]), so that unnecessary parts of the document are not searched
only search siblings and not descendants (following-sibling:: instead of following::)

Html agility xpath get following node if

Answers (1)

Related Questions