CJ Dennis
CJ Dennis

Reputation: 4336

XPath 1.0 - Make selection based on value of text spread over multiple nodes

<root>
  <div>
    <p>this text</p>
    <p><span>fo</span><span>ob</span><span>ar</span></p>
  </div>

  <div>
    <p>this text</p>
    <p><span>fo</span><span>b</span><span>ar</span></p>
  </div>

  <div>
    <p>this text</p>
    <p><span>fooba</span><span>r</span></p>
  </div>

  <div>
    <p><span>foo</span>this text<span>bar</span></p>
  </div>

  <div>
    <p><span>foo</span><img/><span>bar</span></p>
  </div>

  <div>
    <p><span>foo</span><span>bar</span><span>baz</span></p>
  </div>

  <div>
    <p>foobar</p>
  </div>
</root>

Given the above XML what XPath 1.0 query would select the <div>s based on foobar appearing within a single <span> or split across multiple consecutive <span>s?

I have tried using concat() but that doesn't work because I need to know the number of arguments first. Also, saying concat(//*, //*) is equivalent to concat(//*[1], //*[1]), which is not what I want.

This is within PHP so I only have XPath 1.0.

Upvotes: 0

Views: 441

Answers (2)

DSHCS
DSHCS

Reputation: 11

I had a document with paragraphs (<p>) who’s string value (.) contained a prefix (question:). I needed to strip off the prefix and all ancestor elements, but retain the paragraph (<p>) and any elements following the prefix. The prefix could have been distributed across more than one element at different depths in the XML. This solution was restricted to XSLT 1.0. I found that by recursing across descendant::text() and keeping track of the sum of the text node string lengths I could determine when I was at the text node that contained the end of the prefix. Note the apply template selection that selects only paragraphs that start with the prefix thus allowing the use of only the sum of the text node lengths to detect where to stop. You could accumulate the actual string also and use a different test (contains) to determine when to stop.

Sample XML (excuse the complexity, needed for testing)

<?xml version="1.0" encoding="utf-8" ?>
<root>
    <p><d1><d2>q<a>u<b>e<c>s</c><d>t</d>i</b><e>o</e></a><f>n</f>:</d2></d1> text</p>
</root>

Sample XSL (note <trace> used to document function)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/root/p[substring(.,1,9)='question:']">
      <trace info="{concat('ML descendant::text()[1]:',' name=',name(),', .=',.)}"/>
      <xsl:apply-templates select="descendant::text()[1]" mode="m1"/>
  </xsl:template>

  <xsl:template mode="m1" match="text()">
      <xsl:param name="length" select="0"/>
      <xsl:variable name="temp" select="$length+string-length()"/>

      <trace info="{concat('m1:',' name=',name(),', length=',$temp,', .=',.)}"/>
      <xsl:choose>
          <xsl:when test="$temp&lt;9">
            <xsl:apply-templates select="following::text()[1]" mode="m1">
                <xsl:with-param name="length" select="$temp"/>
            </xsl:apply-templates>
          </xsl:when>
          <xsl:otherwise>              
            <trace info="m1: prefix match"/>
          </xsl:otherwise>
      </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

Output

<?xml version="1.0" encoding="UTF-8"?>
    <trace info="ML descendant::text()[1]: name=p, .=question: text"/>
<trace info="m1: name=, length=1, .=q"/>
<trace info="m1: name=, length=2, .=u"/>
<trace info="m1: name=, length=3, .=e"/>
<trace info="m1: name=, length=4, .=s"/>
<trace info="m1: name=, length=5, .=t"/>
<trace info="m1: name=, length=6, .=i"/>
<trace info="m1: name=, length=7, .=o"/>
<trace info="m1: name=, length=8, .=n"/>
<trace info="m1: name=, length=9, .=:"/>
<trace info="m1: prefix match"/>

Upvotes: 0

har07
har07

Reputation: 89285

You can try this XPath :

/root/div[contains(normalize-space(.), 'foobar')]

Notice that . returns concatenation of all text nodes within current context node.

output in xpath tester :

Element='<div>
  <p>this text</p>
  <p>
    <span>fo</span>
    <span>ob</span>
    <span>ar</span>
  </p>
</div>'
Element='<div>
  <p>this text</p>
  <p>
    <span>fooba</span>
    <span>r</span>
  </p>
</div>'

Upvotes: 2

Related Questions