directory
directory

Reputation: 3167

Xpath select items except last item wth contain syntax

I want to select the following html items (action,comedy) but except the last one (tags).

To select all my follow code is working:

//*[@id="video-tags"]//a[contains(@href,'tags')]

But to select except the last one (tags), it won't work with my follow code:

//*[@id="video-tags"]//a[contains(@href,'tags') not(position() > last() -1)]

The html

<ul id="video-tags">
        <li>Uploader: </li>
        <li class="profile_name"><a href="/profiles/wilco">wilco</a></li>
        <li><em>Tagged: </em></li>
        <li><a href="/tags/action">action</a>, </li>
        <li><a href="/tags/comedy">comedy</a>, </li>
        <li>more <a href="/tags/"><strong>tags</strong></a></li>
</ul>

Thanks in advance

Nick

Upvotes: 11

Views: 11302

Answers (2)

Mohit Gupta
Mohit Gupta

Reputation: 81

try this

(//ul[@id="video-tags"]//a[contains(@href,'tags')]/text())

Upvotes: 0

Ian Roberts
Ian Roberts

Reputation: 122414

Aside from the syntax error - you need an and, i.e. contains(@href,'tags') and not(position()...) - you're tripping up on a subtlety of how // is defined.

The XPath //a[position() < last()] will not give you every a except the last one, it will give you every a that is not the last a inside its respective parent element. Since each li contains at most one a, every a is the last a in its respective parent, so this test will match nothing at all.

You can achieve what you want by wrapping most of the expression in parentheses and putting the position check in a separate predicate

(//*[@id="video-tags"]//a[contains(@href,'tags')])[position() < last()]

The parentheses cause the final predicate to apply to the node set selected by the expression as a whole, rather than just to the a location step, i.e. it will first find all the a elements whose href contains "tags", then return all but the last of these selected elements in document order.


Technical explanation - the definition of // in XPath is that it is a shorthand for /descendant-or-self::node()/ (including the slashes), which is a location step that gives you this node and all its descendant nodes. So //a means /descendant-or-self::node()/child::a, and //a[something] means /descendant-or-self::node()/child::a[something] - the predicate applies to the child:: step, not the descendant-or-self:: one. If you want to apply a predicate to the descendant search then you should use the descendant:: axis explicitly - /descendant::a[something].

Upvotes: 22

Related Questions