Nandha
Nandha

Reputation: 33

Xpath to select between tags

This question may be asked here, I searched for hours but not able to figure out how to select between the nodes.

I am a novice in XPath selection, I am trying to select all the li under specific h6, the challenge here is the all the h6 have the same class attributes so not able to identify the unique identifier, when I type //h6/ui/li/span it gets all li in the document under all the h6.

<h6> Text1 </h6>
<ul>
    <li><span> list1 </span></li>
    <li><span> list2 </span></li>
    <li><span> list3 </span></li>
</ul>
<div> </div>
<h6> Text 2</h6>

Not sure how to proceed, first time I am writing a question here, let me know if more information is required.

Any help appreciated

Upvotes: 2

Views: 953

Answers (3)

Nandha
Nandha

Reputation: 33

Following XPATH worked for my requirement xpath("//h6[text()='Text 1']/following::ul[1]/li/span/text()")

I selected the H6 and following first UL

Upvotes: 0

Eric Chow
Eric Chow

Reputation: 479

Firstly, you do not tell us the version of your XPath. There are XPath 1.0, 2.0, and 3.0/3.1. There are different. Each is built on the previous version.

For XPath 1.0

You can refer to XPath between two elements.

The idea is to get two lists, one is from <h6> Text1 </h6> to end (following-sibling), the other from <h6> Text 2</h6> to start(preceding-sibling). Then, get the intersect of the two lists.

In the last predicate, we put the node to the list, get the count, and we compare the count of the list. If they equal, the node is in the list.

The following is the code in Python for your problem:

from lxml import etree
root = etree.XML("""
<root>
<h6> Text 1 </h6>
<ul>
    <li><span> list1 </span></li>
    <li><span> list2 </span></li>
    <li><span> list3 </span></li>
</ul>
<div> </div>
<h6> Text 2 </h6>
</root>
""")

root.xpath("h6[text()=' Text 2 ']/preceding-sibling::*[count(.|/root/h6[text()=' Text 1 ']/following-sibling::*)=count(/root/h6[text()=' Text 1 ']/following-sibling::*)]")

For XPath 2.0

There is an intersect keyword, so the previous XPath can be simplified to:

h6[text()=' Text 1 ']/following-sibling::* intersect h6[text()=' Text 2 ']/preceding-sibling::*

For XPath 3.0/3.1

You can use variable declaration to find the start position and end position first. And select elements between the two positions.

let $x := index-of(h6[text()=' Text 1 ']),
    $y := index-of(h6[text()=' Text 2 '])
return *[position()>=$x and position()<=$y]

The solution in XPath 3.x will be much faster because the time complexity is only n, while for XPath 2.0 and 1.0, the time complexity is n-squared.

Again, I do not know your XPath version. for lxml, you can only use XPath 1.0. For other packages, please find out yourself.

Upvotes: 3

JaSON
JaSON

Reputation: 4869

Try this one to get required output

//ul[preceding-sibling::h6[1][normalize-space()="Text1"]]/li

Upvotes: 1

Related Questions