Daniel
Daniel

Reputation: 75

Confusing whitespace issue in XSLT

I have two versions of a document encoded in one TEI XML and wish to output one of the versions to a text file. Here’s the sample XML:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
      <fileDesc>
         <titleStmt>
            <title>Title</title>
         </titleStmt>
         <publicationStmt>
            <p>Publication Information</p>
         </publicationStmt>
         <sourceDesc>
            <p>Information about the source</p>
         </sourceDesc>
      </fileDesc>
  </teiHeader>
  <text>
      <body>
         <p>John Q Doe was born in 
            <app>
               <rdg wit="text1">Omaha</rdg>
               <rdg wit="text2">Lincoln</rdg>
            </app>
        in 1950. But was he
            <app>
               <rdg wit="text1">happy</rdg>
               <rdg wit="text2">glad</rdg>
            </app>?
        Some say no.
         </p>
      </body>
  </text>
</TEI>

And here’s the sample XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs tei" version="2.0">

    <xsl:output omit-xml-declaration="yes" method="text" encoding="UTF-8"/>

    <xsl:template match="text()">
        <xsl:value-of select='normalize-space()'/>
    </xsl:template>


    <xsl:template match="/">
        <xsl:apply-templates></xsl:apply-templates>
    </xsl:template>

    <xsl:template match="tei:teiHeader">
    </xsl:template>

    <xsl:template match="tei:app">       
        <xsl:apply-templates/>
    </xsl:template>


    <xsl:template match="tei:rdg[@wit='text1']">
        <xsl:apply-templates/>
    </xsl:template>


    <!-- Cancel out the alternate version of the text-->
    <xsl:template match="tei:rdg[@wit='text2']">
    </xsl:template>

</xsl:stylesheet>

What I want to output is “John Q Doe was born in Omaha in 1950. But was he happy? Some say no.” What I end up with is “John Q Doe was born inOmahain 1950. But was hehappy? Some say no.” So, I somehow need to preserve a single space around the app elements. I can’t preserve-space() because I use extra whitespace for readability, and I can’t simply use <xsl:text> to insert spaces on the template match for tei:app, because sometimes punctuation comes immediately after the <app> element, as does the question mark above. I’m stumped.

Upvotes: 2

Views: 1445

Answers (2)

Ian Roberts
Ian Roberts

Reputation: 122414

It looks like you essentially want a special case of normalize-space() that just does the normalizing of runs of whitespace (including at the start and end of the string) down to a single space, without also stripping leading and trailing whitespace. Since you're in XSLT 2.0 you can do that with a simple regular expression:

<xsl:template match="text()">
    <xsl:value-of select="replace(., '\s+', ' ')"/>
</xsl:template>

You would also need to add

<xsl:strip-space elements="*"/>

to the top of the stylesheet in order to suppress text nodes that are entirely whitespace. Without that you'll end up with an extra space in your output for each all-space text node (e.g. between <text> and <body>, <body> and <p>, </rdg> and </app>, etc.). The strip-space directive only affects all-whitespace text nodes, it does not affect the whitespace within text nodes that also contain useful non-space content.

Upvotes: 1

Philipp
Philipp

Reputation: 4749

You need to add the missing spaces. If you put a <xsl:text> </xsl:text> you will get a space before and after the elements:

<xsl:template match="tei:rdg[@wit='text1']">
    <xsl:text> </xsl:text>
    <xsl:apply-templates/>
    <xsl:text> </xsl:text>
</xsl:template>

This gives following output:

John Q Doe was born in Omaha in 1950. But was he happy ? Some say no

Upvotes: 0

Related Questions