Reputation: 137
I have been trying to create an xpath supposed to locate the first three Yes
within p
elements until the text Demarcation
within h1
elements. The existing one which I've used within the below script locates all the text within p
elements. However, I can't find any idea to move along. Just consider the one I've created already to be a placeholder.
How can I create an xapth to be able to locate first three Yes
within p
elements and nothing else?
My attempt so far:
from lxml.html import fromstring
htmldoc="""
<li>
<a>Nope</a>
<a>Nope</a>
<p>Yes</p>
<p>Yes</p>
<p>Yes</p>
<h1>Demarcation</h1>
<p>No</p>
<p>No</p>
<h1>Not this</h2>
<p>No</p>
<p>Not this</p>
</li>
"""
root = fromstring(htmldoc)
for item in root.xpath("//li/p"):
print(item.text)
Upvotes: 1
Views: 50
Reputation: 18799
It looks like you are trying to depend on the h1
tag containing Demarcation
, so start from it:
//h1[contains(., "Demarcation")]/preceding-sibling::p[contains(., "Yes")][position()<4]
The idea is to get previous p
elements and I added the position()<4
so you only get three, you can remove that if you just need all of the p
:
//h1[contains(., "Demarcation")]/preceding-sibling::p[contains(., "Yes")]
Upvotes: 0
Reputation: 52685
Try below to select paragraphs that are preceding siblings of header "Demarcation"
//li/p[following-sibling::h1[.="Demarcation"]]
Upvotes: 2