Keith Davies
Keith Davies

Reputation: 215

XSLT: Splitting element contents on a text delimiter, keeping elements

I'm trying to parse element contents and split them on a delimiter, while keeping all elements in the parent. I don't need -- don't want -- to find the delimiter inside the child elements.

<data>
  <parse-field>Some text <an-element /> more text; cheap win? ;
    <another-element>with delimiter;!</another-element>; final text</parse-field>
</data>

Should become

<data>
  <parsed-field>
    <field>Some text <an-element /> more text</field>
    <field>cheap win?</field>
    <field><another-element>with limiter;!</another-element></field>
    <field>final text</field>
  </parsed-field>
</data>

I've got a hacked-together solution that examines all "parse-field/text()" and replaces the delimiter with <token />, then a second pass to pick out the pieces around the<token>s, but it's... hacked. And unpleasant. I'm wondering if there's a better way.

I'm using XSLT-2.0, open to XSLT-1.0 solutions. SAXON processor.

Upvotes: 1

Views: 3986

Answers (2)

Keith Davies
Keith Davies

Reputation: 215

Best approach I've had so far, in simple form:

<xsl:variable name="delimiter" select="';'" />

<xsl:template match="foo">
  <xsl:copy>
    <xsl:apply-templates select="@*" />
    <xsl:call-template name="tokenize" />
  </xsl:copy>
</xsl:template>

<xsl:template name="tokenize">
  <xsl:variable name="rough">
    <xsl:apply-templates mode="tokenize" />
  </xsl:variable>
  <xsl:copy>
    <xsl:group-by select="$rough/*" group-ending-with="delimiter">
      <field><xsl:apply-templates select="current-group()[not(self::delimiter)]" /></field>
    </xsl:group>
  </xsl:copy>
</xsl:template>

<xsl:template match="*" mode="tokenize">
  <xsl:copy>
    <xsl:apply-templates select="@*|*|node()" />
  </xsl:copy>
</xsl:template>

<xsl:template match="text()" mode="tokenize">
  <xsl:analyze-string select="." regex="([^{$delimiter}]*){$delimiter}">
    <xsl:matching-substring>
      <xsl:value-of select="regex-group(1)" /><delimiter/>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="." />
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

Upvotes: 1

michael.hor257k
michael.hor257k

Reputation: 116959

This is not (yet?) a complete answer, just an outline of a possible approach. If you would make your first pass something like:

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="parse-field/text()">
    <xsl:call-template name="tokenize">
        <xsl:with-param name="text" select="."/>
    </xsl:call-template>
</xsl:template>

<xsl:template name="tokenize">
    <xsl:param name="text"/>
    <xsl:param name="delimiter" select="';'"/>
    <xsl:choose>
        <xsl:when test="contains($text, $delimiter)">
            <field>
                <xsl:value-of select="substring-before($text, $delimiter)"/>
            </field>
            <!-- recursive call -->
            <xsl:call-template name="tokenize">
                <xsl:with-param name="text" select="substring-after($text, $delimiter)"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:when test="position()=last()">
            <field><xsl:value-of select="$text"/></field>
        </xsl:when>
        <xsl:when test="$text">
            <text><xsl:value-of select="$text"/></text>
        </xsl:when>
    </xsl:choose>
</xsl:template>

you would obtain:

<?xml version="1.0" encoding="UTF-8"?>
<data>
   <parse-field>
      <text>Some text </text>
      <an-element/>
      <field> more text</field>
      <field> cheap win? </field>
      <another-element>with delimiter;!</another-element>
      <field/>
      <field> final text</field>
   </parse-field>
</data>

This is now a grouping problem, where elements of <parse-field> need to be grouped, with each group ending with <field>.

Upvotes: 2

Related Questions