chuckfinley
chuckfinley

Reputation: 763

Return full text element (including child/descendant elements)

I'm trying to get the text from the first occurrence on the page of div/p, and only the first p. The <p> contains other tags (<b>, <a href>) and the returned text from <p> stops at any other tag. Is there a way to get this line to return all the text between <p> and </p>, even between embedded tags?

puts doc.xpath('html/body/div/p[1]/text()').first

Upvotes: 3

Views: 460

Answers (2)

Phrogz
Phrogz

Reputation: 303254

Using Nokogiri as an alternative for more XPath you can use Nokogiri::XML::Node#inner_text:

puts doc.xpath('html/body/div/p[1]').inner_text

Upvotes: 0

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243449

Use:

string((//div/p)[1])

When this XPath expression is evaluated the result is the string value of the first p in the document that is a child of a div.

By definition the string value of an element is the concatenation (in document order) of all of its text-node descendents.

Therefore, you get exactly all the text in the subtree rooted by this p element, with any other nodes (elements, comments, PIs) skipped.

XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
     <xsl:copy-of select="string(p)"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the following XML document (no such provided!):

<p>
 Hello <b>
  <a href="http://www.w3.org/TR/2008/REC-xml-20081126/">XML</a>
   World!</b>
</p>

the result of the evaluated XPath expression is output:

 Hello XML
   World!

Upvotes: 5

Related Questions