Reputation: 33
This question may be asked here, I searched for hours but not able to figure out how to select between the nodes.
I am a novice in XPath selection, I am trying to select all the li under specific h6, the challenge here is the all the h6 have the same class attributes so not able to identify the unique identifier, when I type //h6/ui/li/span
it gets all li in the document under all the h6.
<h6> Text1 </h6>
<ul>
<li><span> list1 </span></li>
<li><span> list2 </span></li>
<li><span> list3 </span></li>
</ul>
<div> </div>
<h6> Text 2</h6>
Not sure how to proceed, first time I am writing a question here, let me know if more information is required.
Any help appreciated
Upvotes: 2
Views: 953
Reputation: 33
Following XPATH worked for my requirement xpath("//h6[text()='Text 1']/following::ul[1]/li/span/text()")
I selected the H6 and following first UL
Upvotes: 0
Reputation: 479
Firstly, you do not tell us the version of your XPath. There are XPath 1.0, 2.0, and 3.0/3.1. There are different. Each is built on the previous version.
You can refer to XPath between two elements.
The idea is to get two lists, one is from <h6> Text1 </h6>
to end (following-sibling), the other from <h6> Text 2</h6>
to start(preceding-sibling). Then, get the intersect of the two lists.
In the last predicate, we put the node to the list, get the count, and we compare the count of the list. If they equal, the node is in the list.
The following is the code in Python for your problem:
from lxml import etree
root = etree.XML("""
<root>
<h6> Text 1 </h6>
<ul>
<li><span> list1 </span></li>
<li><span> list2 </span></li>
<li><span> list3 </span></li>
</ul>
<div> </div>
<h6> Text 2 </h6>
</root>
""")
root.xpath("h6[text()=' Text 2 ']/preceding-sibling::*[count(.|/root/h6[text()=' Text 1 ']/following-sibling::*)=count(/root/h6[text()=' Text 1 ']/following-sibling::*)]")
There is an intersect
keyword, so the previous XPath can be simplified to:
h6[text()=' Text 1 ']/following-sibling::* intersect h6[text()=' Text 2 ']/preceding-sibling::*
You can use variable declaration
to find the start position and end position first. And select elements between the two positions.
let $x := index-of(h6[text()=' Text 1 ']),
$y := index-of(h6[text()=' Text 2 '])
return *[position()>=$x and position()<=$y]
The solution in XPath 3.x will be much faster because the time complexity is only n, while for XPath 2.0 and 1.0, the time complexity is n-squared.
Again, I do not know your XPath version. for lxml
, you can only use XPath 1.0. For other packages, please find out yourself.
Upvotes: 3
Reputation: 4869
Try this one to get required output
//ul[preceding-sibling::h6[1][normalize-space()="Text1"]]/li
Upvotes: 1