Reputation: 525
I have the following code :
<div class = "content">
<table id="detailsTable">...</table>
<div class = "desc">
<p>Some text</p>
</div>
<p>Another text<p>
</div>
I want to select all the text within the 'content' class, which I would get using this xPath :
doc.xpath('string(//div[@class="content"])')
The problem is that it selects all the text including text within the 'table' tag. I need to exclude the 'table' from the xPath. How would I achieve that?
Upvotes: 0
Views: 325
Reputation: 5915
XPath 1.0 solutions :
substring-after(string(//div[@class="content"]),string(//div[@class="content"]/table))
Or just use concat :
concat(//table/following::p[1]," ",//table/following::p[2])
Upvotes: 1
Reputation: 163595
The XPath expression //div[@class="content"]
selects the div
element - nothing more and nothing less - and applying the string()
function gives you the string value of the element, which is the concatenation of all its descendant text nodes.
Getting all the text except for that containing in one particular child is probably not possible in XPath 1.0. With XPath 2.0 it can be done as
string-join(//div[@class="content"]/(node() except table)//text(), '')
But for this kind of manipulation, you're really in the realm of transformation rather than pure selection, so you're stretching the limits of what XPath is designed for.
Upvotes: 0