martinanton
martinanton

Reputation: 217

XSLT search and replace punctuation mark

I have an XSLT-cascade transferring XML to TeX. In the last step I have a simple xml file with all text between two tags and I want to apply several search and replace routines.

So an input file like this:

<start>
    .–
    ,–
    {– 
</start>

when applied with this XSLT (more or less verbatim taken from Replacing strings in various XML files)

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>
    <xsl:param name="list">
        <words>
             <word>
            <search> / </search>
            <replace>\allowbreak\,\slash\,\allowbreak{}</replace>
        </word>
        <word>
            <search>.–</search>
            <replace>{\dotdash}</replace>
        </word>
        <word>
            <search>,–</search>
            <replace>{\commadash}</replace>
        </word>
        <word>
            <search>;–</search>
            <replace>{\semicolondash}</replace>
        </word>
        <word>
            <search>!–</search>
            <replace>{\excdash}</replace>
        </word>
        </words>
    </xsl:param>
    
    <xsl:template match="@*|*|comment()|processing-instruction()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="text()">
        <xsl:variable name="search" select="concat('(',string-join($list/words/word/search,'|'),')')"/>
        <xsl:analyze-string select="." regex="{$search}">
            <xsl:matching-substring>
                <xsl:value-of select="$list/words/word[search=current()]/replace"/>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>
</xsl:stylesheet>

Should have the following output:

\dotdash{}

\commadash{}

{–

Unfortunately "{–" seems to trigger something and disappears. Can anyone explain why?

Upvotes: 1

Views: 798

Answers (1)

Daniel Haley
Daniel Haley

Reputation: 52888

Glad the original answer you linked to helped. Please consider upvoting if you haven't already. ;-)

The problem is . is special in regex. So <search>.–</search> would match any character followed by -.

You should escape the . in your search variable:

<xsl:variable name="search" select="replace(concat('(',string-join($list/words/word/search,'|'),')'),'\.','\\.')"/>

You will need to escape any other special regex characters as well, so you might consider creating an xsl:function to make that part easier.

Here's an example of a function that will escape . and { for starters...

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:so="stackoverflow example" exclude-result-prefixes="so">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>
  <xsl:param name="list">
    <words>
      <word>
        <search> / </search>
        <replace>\allowbreak\,\slash\,\allowbreak{}</replace>
      </word>
      <word>
        <search>.–</search>
        <replace>{\dotdash}</replace>
      </word>
      <word>
        <search>,–</search>
        <replace>{\commadash}</replace>
      </word>
      <word>
        <search>;–</search>
        <replace>{\semicolondash}</replace>
      </word>
      <word>
        <search>!–</search>
        <replace>{\excdash}</replace>
      </word>
      <!--<word>
        <search>{–</search>
        <replace>bam!</replace>
      </word>-->
    </words>
  </xsl:param>

  <xsl:function name="so:escapeRegex">
    <xsl:param name="regex"/>
    <xsl:analyze-string select="$regex" regex="\.|\{{">
      <xsl:matching-substring>
        <xsl:value-of select="concat('\',.)"/>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:function>

  <xsl:template match="@*|*|comment()|processing-instruction()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="text()">
    <xsl:variable name="search" select="so:escapeRegex(concat('(',string-join($list/words/word/search,'|'),')'))"/>
    <xsl:analyze-string select="." regex="{$search}">
      <xsl:matching-substring>
        <xsl:message>"<xsl:value-of select="."/>" matched <xsl:value-of select="$search"/></xsl:message>
        <xsl:value-of select="$list/words/word[search=current()]/replace"/>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:template>
</xsl:stylesheet>

If you uncomment the last word in your list param, it will replace the {– in your example.

Upvotes: 1

Related Questions