Reputation: 609
I am using this XPATH query to try grab the first three items from the "ASQ Package Price":
//h2[contains(., 'ASQ Package Features')]/following-sibling::p
But it also grabs the other 3 items, so I end up with
Example 1 Example 2 Example 3 Example 4 Example 5 Example 6
I only want:
Example 1 Example 2 Example 3
How do I prevent XPATH from scraping the three I don't want - seems in this case it needs to stop at the <hr>
tag?
<div itemprop="articleBody">
<h2>ASQ Package Price</h2>
<p class="">Example 1</p>
<p class="">Example 2</p>
<p class="">Example 3</p>
<hr>
<h2>ASQ Package Features </h2>
<p class="">Example 4</p>
<p class="">Example 5</p>
<p class="">Example 6</p>
</div>
Upvotes: 1
Views: 390
Reputation: 243599
Use:
(//h2[starts-with(., 'ASQ Package')])[1]/following-sibling::hr[1]
/preceding-sibling::p
Verification with XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"(//h2[starts-with(., 'ASQ Package')])[1]
/following-sibling::hr[1]
/preceding-sibling::p"/>"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is run on the provided Html (adjusted to be syntactically correct XHtml):
<html>
<div itemprop="articleBody">
<h2>ASQ Package Price</h2>
<p class="">Example 1</p>
<p class="">Example 2</p>
<p class="">Example 3</p>
<hr />
<h2>ASQ Package Features </h2>
<p class="">Example 4</p>
<p class="">Example 5</p>
<p class="">Example 6</p>
</div>
</html>
the XPath expression is evaluated, and all selected by it nodes are output:
<p class="">Example 1</p>
<p class="">Example 2</p>
<p class="">Example 3</p>
Explanation:
We need the preceding-sibling <p>
elements only of the first <hr>
following-sibling of the first<h2>
in the document, whose string value starts with "ASQ Package"
, and
The first such <h2>
element is selected by this XPath expression:
(//h2[starts-with(., 'ASQ Package Features')])[1]
Then we select its first following sibling <hr>
:
(//h2[starts-with(., 'ASQ Package Features')])[1]/following-sibling::hr[1]
Then we select all its preceding-sibling <p>
elements:
(//h2[starts-with(., 'ASQ Package')])[1]/following-sibling::hr[1]
/preceding-sibling::p
Upvotes: 2
Reputation: 24940
Using xpath 2.0:
//h2/following-sibling::p intersect //hr/preceding-sibling::p
Using xpath 1.0:
//h2/following-sibling::p[not(preceding-sibling::hr)]
Upvotes: 0