Reputation: 215
I'm trying to parse element contents and split them on a delimiter, while keeping all elements in the parent. I don't need -- don't want -- to find the delimiter inside the child elements.
<data>
<parse-field>Some text <an-element /> more text; cheap win? ;
<another-element>with delimiter;!</another-element>; final text</parse-field>
</data>
Should become
<data>
<parsed-field>
<field>Some text <an-element /> more text</field>
<field>cheap win?</field>
<field><another-element>with limiter;!</another-element></field>
<field>final text</field>
</parsed-field>
</data>
I've got a hacked-together solution that examines all "parse-field/text()" and replaces the delimiter with <token />
, then a second pass to pick out the pieces around the<token>
s, but it's... hacked. And unpleasant. I'm wondering if there's a better way.
I'm using XSLT-2.0, open to XSLT-1.0 solutions. SAXON processor.
Upvotes: 1
Views: 3986
Reputation: 215
Best approach I've had so far, in simple form:
<xsl:variable name="delimiter" select="';'" />
<xsl:template match="foo">
<xsl:copy>
<xsl:apply-templates select="@*" />
<xsl:call-template name="tokenize" />
</xsl:copy>
</xsl:template>
<xsl:template name="tokenize">
<xsl:variable name="rough">
<xsl:apply-templates mode="tokenize" />
</xsl:variable>
<xsl:copy>
<xsl:group-by select="$rough/*" group-ending-with="delimiter">
<field><xsl:apply-templates select="current-group()[not(self::delimiter)]" /></field>
</xsl:group>
</xsl:copy>
</xsl:template>
<xsl:template match="*" mode="tokenize">
<xsl:copy>
<xsl:apply-templates select="@*|*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="text()" mode="tokenize">
<xsl:analyze-string select="." regex="([^{$delimiter}]*){$delimiter}">
<xsl:matching-substring>
<xsl:value-of select="regex-group(1)" /><delimiter/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="." />
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
Upvotes: 1
Reputation: 116959
This is not (yet?) a complete answer, just an outline of a possible approach. If you would make your first pass something like:
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="parse-field/text()">
<xsl:call-template name="tokenize">
<xsl:with-param name="text" select="."/>
</xsl:call-template>
</xsl:template>
<xsl:template name="tokenize">
<xsl:param name="text"/>
<xsl:param name="delimiter" select="';'"/>
<xsl:choose>
<xsl:when test="contains($text, $delimiter)">
<field>
<xsl:value-of select="substring-before($text, $delimiter)"/>
</field>
<!-- recursive call -->
<xsl:call-template name="tokenize">
<xsl:with-param name="text" select="substring-after($text, $delimiter)"/>
</xsl:call-template>
</xsl:when>
<xsl:when test="position()=last()">
<field><xsl:value-of select="$text"/></field>
</xsl:when>
<xsl:when test="$text">
<text><xsl:value-of select="$text"/></text>
</xsl:when>
</xsl:choose>
</xsl:template>
you would obtain:
<?xml version="1.0" encoding="UTF-8"?>
<data>
<parse-field>
<text>Some text </text>
<an-element/>
<field> more text</field>
<field> cheap win? </field>
<another-element>with delimiter;!</another-element>
<field/>
<field> final text</field>
</parse-field>
</data>
This is now a grouping problem, where elements of <parse-field>
need to be grouped, with each group ending with <field>
.
Upvotes: 2