rlandster
rlandster

Reputation: 7825

XPath query to get nth instance of an element

There is an HTML file (whose contents I do not control) that has several input elements all with the same fixed id attribute of "search_query". The contents of the file can change, but I know that I always want to get the second input element with the id attribute "search_query".

I need an XPath expression to do this. I tried //input[@id="search_query"][2] but that does not work. Here is an example XML string where this query failed:

<div>
  <form>
    <input id="search_query" />
   </form>
</div>

<div>
  <form>
    <input id="search_query" />
  </form>
</div>

<div>
  <form>
    <input id="search_query" />
  </form>
</div>

Keep in mind that that the above is merely an example and the other HTML code can be quite different and the input elements can appear anywhere with no consistent document structure (except that I am guaranteed there will always be at least two input elements with an id attribute of "search_query").

What is the correct XPath expression?

Upvotes: 194

Views: 235251

Answers (2)

rlandster
rlandster

Reputation: 7825

This seems to work:

/descendant::input[@id="search_query"][2]

I go this from "XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition" by Michael Kay.

There is also a note in the "Abbreviated Syntax" section of the XML Path Language specification that provided a clue.

Compare what works:

/descendant::input[@id="search_query"][2]

to what does not work:

//input[@id="search_query"][2]

The difference is /descendant:: vs //. Consider what "//" means:

A "//" at the beginning of a path expression is an abbreviation for the initial steps (fn:root(self::node()) treat as document-node())/descendant-or-self::node()

And consider the difference between descendant and descendant-or-self:

The path expression //para[1] does not mean the same as the path expression /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their respective parents.

It's probably because the "context position" changes with descendant-or-self but doesn't change with descendant:

The context position is the position of the context item within the sequence of items currently being processed. It changes whenever the context item changes.

Upvotes: 32

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243449

This is a FAQ:

//somexpression[$N]

means "Find every node selected by //somexpression that is the $Nth child of its parent".

What you want is:

(//input[@id="search_query"])[2]

Remember: The [] operator has higher precedence (priority) than the // abbreviation.

Upvotes: 368

Related Questions