Tench
Tench

Reputation: 525

XSL analyze-string difficulty with tokenized strings

I need to tokenize a string and then run analyze-string on each of the tokens. This, however, seems impossible:

"XPTY0020: Required item type of the context item for the child axis is node(); supplied value has item type xs:string) because analyze-string requires a node context".

This is driving me insane, because analyze-string should, well, analyze strings, so I don't understand how to go around this problem.

My (simplified) XML looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<rows>
    <row>
        <field name="def">1) ἀλλά sed, vero 2) καί et 3) а cum condicionali iunctum aequiparat
            аште: 4) ἵνα ut chron.</field>
    </row>
    <row>
        <field name="def">ἡλοῦν clavo figere</field>
    </row>
</rows>

and my stylesheet looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0">

    <xsl:strip-space elements="*"/>
    <xsl:output omit-xml-declaration="no" indent="yes"/>

    <xsl:template match="field[@name = 'def']">
        <entry>
            <xsl:call-template name="sense">
                <xsl:with-param name="def" select="."/>
            </xsl:call-template>
        </entry>
    </xsl:template>

    <xsl:template name="sense">
        <xsl:param name="def"/>
        <xsl:param name="separator" select="'\d{1,2}\)\s'"/>

        <xsl:for-each select="tokenize(normalize-space($def), $separator)">
            <xsl:if test="string-length(.) > 0">
                <xsl:element name="sense">
                    <xsl:attribute name="n">
                        <xsl:value-of select="position() - 1"/>
                    </xsl:attribute>
                    <!--this is the problematic bit, because current() is 
                    a string here -\- and, paradoxically, analyze-string
                    cannot deal with it-->
                    <xsl:analyze-string select="current()"
                        regex="^([\p{IsGreek}\p{IsGreekExtended}]+[\s]*[\p{IsGreek}\p{IsGreekExtended}]*)(.*$)">
                        <xsl:matching-substring>
                            <greek>
                                <xsl:value-of select="regex-group(1)"/>
                                <xsl:value-of select="regex-group(2)"/>
                            </greek>
                        </xsl:matching-substring>
                        <xsl:non-matching-substring>
                            <xsl:value-of select="current()"/>
                        </xsl:non-matching-substring>
                    </xsl:analyze-string>
                </xsl:element>
            </xsl:if>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

Without the problematic of analyze-string, the above stylesheet will correctly produce the following output:

<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <sense n="1">ἀλλά sed, vero </sense>
   <sense n="2">καί et </sense>
   <sense n="3">а cum condicionali iunctum aequiparat аште: </sense>
   <sense n="4">ἵνα ut chron.</sense>
</entry>
<entry xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <sense n="0">ἡλοῦν clavo figere</sense>
</entry>

The stylesheet uses the tokenize() method in order to separate multiple senses. Then, for each of the identified senses, I want to use analyze-string to wrap the first greek word with <greek></greek>.

What workaround can I use to make analyze-string work on tokens, i.e. strings, rather than nodes?

Many thanks in advance!

Upvotes: 2

Views: 688

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167516

I think the problem is that the regex attribute allows attribute value templates so your curly braces need to be doubled to say

regex="^([\p{{IsGreek}}\p{{IsGreekExtended}}]+[\s]*[\p{{IsGreek}}\p{{IsGreekExtended}}]*)(.*$)"

Or you need to define the pattern outside in a variable e.g.

<xsl:variable name="pattern">^([\p{IsGreek}\p{IsGreekExtended}]+[\s]*[\p{IsGreek}\p{IsGreekExtended}]*)(.*$)</xsl:variable>

and use regex="{$pattern}".

Upvotes: 3

Related Questions