Reputation: 763
I'm trying to get the text from the first occurrence on the page of div/p, and only the first p. The <p> contains other tags (<b>, <a href>) and the returned text from <p> stops at any other tag. Is there a way to get this line to return all the text between <p> and </p>, even between embedded tags?
puts doc.xpath('html/body/div/p[1]/text()').first
Upvotes: 3
Views: 460
Reputation: 303254
Using Nokogiri as an alternative for more XPath you can use Nokogiri::XML::Node#inner_text
:
puts doc.xpath('html/body/div/p[1]').inner_text
Upvotes: 0
Reputation: 243449
Use:
string((//div/p)[1])
When this XPath expression is evaluated the result is the string value of the first p
in the document that is a child of a div
.
By definition the string value of an element is the concatenation (in document order) of all of its text-node descendents.
Therefore, you get exactly all the text in the subtree rooted by this p
element, with any other nodes (elements, comments, PIs) skipped.
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="string(p)"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document (no such provided!):
<p>
Hello <b>
<a href="http://www.w3.org/TR/2008/REC-xml-20081126/">XML</a>
World!</b>
</p>
the result of the evaluated XPath expression is output:
Hello XML
World!
Upvotes: 5