jbrehr
jbrehr

Reputation: 815

XML - XSLT to HTML transformation - eliminating specific line/carriage returns

I am outputting an XML file to HTML using XSLT 3.0 and I'm having a trouble eliminating whitespace before commas and periods. Below is an example of the precise problem I am having: the XML has line/carriage returns in it which are being reproduced in the HTML. Ordinarily this isn't a problem as the browser collapses the white-spaces to one white space; however as you can see in the example below it's preserving a white space before commas and periods.

(Note about XML: this is a text-encoding of a medieval manuscript, and therefore can have various elements within it, and it can appear nested within other elements at various levels).

XML:

           <persName>
              <choice>
                 <orig>ar. p<hi rend="sup">a</hi>der</orig>
                 <reg>Arnaldum Prader</reg>
              </choice>
           </persName> et socium eius hereticos et vidit ibi cum eis <persName>
              <choice>
                 <orig>P. barrau</orig>
                 <reg>Poncium Barrau</reg>
              </choice>
           </persName>, <persName>
              <choice>
                 <orig>Iordanetū del maſ</orig>
                 <reg>Iordanetum del Mas</reg>
              </choice>
           </persName>, <persName>
              <choice>
                 <orig>Iordanū de quiders</orig>
                 <reg>Iordanum de Quiders</reg>
              </choice>
           </persName> et <persName>
              <choice>
                 <orig>W. Vitał</orig>
                 <reg>Willelmum Vitalis</reg>
              </choice>
           </persName> predictum et <persName>
              <choice>
                 <orig>ux̄ dc̄ī W. Vitał</orig>
                 <reg>uxor dicti Willelmi Vitalis</reg>
              </choice>
           </persName>.

XML templates:

<!-- format super/sub scripts -->
<xsl:template match="tei:hi" name="template_supersub">
    <xsl:choose>
        <xsl:when test="@rend ='sup'"><sup class="subsup"><xsl:apply-templates/></sup></xsl:when>
        <xsl:when test="@rend ='sub'"><sub class="subsup"><xsl:apply-templates/></sub></xsl:when>
    </xsl:choose> 
</xsl:template>

<!-- parse persName into <spans> -->
<xsl:template match="tei:persName/tei:choice/tei:reg">
    <span class="interpretive"><xsl:apply-templates/></span>
</xsl:template>

<xsl:template match="tei:persName/tei:choice/tei:orig">
    <span class="diplomatic"><xsl:apply-templates/></span>
</xsl:template>

Current HTML output:

     <span class="diplomatic">ar. p<sup class="subsup">a</sup>der</span>
     <span class="interpretive">Arnaldum Prader</span>

      et socium eius hereticos et vidit ibi cum eis 

     <span class="diplomatic">P. barrau</span>
     <span class="interpretive">Poncium Barrau</span>

     , 

     <span class="diplomatic">Iordanetū del maſ</span>
     <span class="interpretive">Iordanetum del Mas</span>

     , 

     <span class="diplomatic">Iordanū de quiders</span>
     <span class="interpretive">Iordanum de Quiders</span>

      et 

     <span class="diplomatic">W. Vitał</span>
     <span class="interpretive">Willelmum Vitalis</span>

      predictum et 

     <span class="diplomatic">ux̄ dc̄ī W. Vitał</span>
     <span class="interpretive">uxor dicti Willelmi Vitalis</span>

     .

Final, problematic output:

Arnaldum Prader et socium eius hereticos et vidit ibi cum eis Poncium Barrau , Iordanetum del Mas , Iordanum de Quiders et Willelmum Vitalis predictum et uxor dicti Willelmi Vitalis .

Various combinations of strip-space, replace(), translate() have not targeted this problem. They usually result in collapsing EVERY white space between elements.

What I would ideally like is no space before commas and periods, and one space after a comma or period. But I can't find a mechanism, let alone a hack, to address this. Thanks.

Desired HTML output:

 <span class="diplomatic">ar. p<sup class="subsup">a</sup>der</span>
 <span class="interpretive">Arnaldum Prader</span> et socium eius 
 hereticos et vidit ibi cum eis <span class="diplomatic">P. 
 barrau</span><span class="interpretive">Poncium Barrau</span>, <span 
 class="diplomatic">Iordanetū del maſ</span><span 
 class="interpretive">Iordanetum del Mas</span>, <span 
 class="diplomatic">Iordanū de quiders</span><span 
 class="interpretive">Iordanum de Quiders</span> et <span 
 class="diplomatic">W. Vitał</span><span class="interpretive">Willelmum 
 Vitalis</span> predictum et <span class="diplomatic">ux̄ dc̄ī W. 
 Vitał</span><span class="interpretive">uxor dicti Willelmi 
 Vitalis</span>.

Upvotes: 3

Views: 173

Answers (2)

friedemann_bach
friedemann_bach

Reputation: 1458

In your answer to your own post you wrote that you "don't understand why this makes a difference". Let me try to help: You need to avoid all whitespace child nodes within choice and persName[choice] from being parsed, literally the spaces between <choice> and <orig>, for example. These are not part of your content but only of TEI structure, and have to be ignored. This is an issue which will recur often and on different levels when you work with TEI.

These templates here should demonstrate how to cover this problem in a more "understanding" way. Instead of applying all templates (and thus including text nodes), you can explicitly name only the elements desired for your output.

<xsl:template match="tei:choice">
    <xsl:apply-templates select="tei:reg"/>
    <xsl:apply-templates select="tei:orig"/>
</xsl:template>

<xsl:template match="tei:persName[tei:choice]">
    <xsl:apply-templates select="tei:choice"/>
</xsl:template>

Final remark: Be aware of your schema. If persName is allowed to contain non-whitespace text outside of choice (and it usually is), you should treat this differently. The solution here works only if persName always contains choice with reg and orig.

Upvotes: 1

jbrehr
jbrehr

Reputation: 815

Posting a response to my own question in order to avoid a really long complicated post.

I adjusted this XSL:

<!-- parse persName into <spans> -->
<xsl:template match="tei:persName/tei:choice/tei:reg">
    <span class="interpretive"><xsl:apply-templates/></span>
</xsl:template>

<xsl:template match="tei:persName/tei:choice/tei:orig">
    <span class="diplomatic"><xsl:apply-templates/></span>
</xsl:template>

To this XSL:

<!-- parse persName into <spans> -->
<xsl:template match="tei:persName">
<span class="interpretive"><xsl:apply-templates select="tei:choice/tei:reg"/></span><span class="diplomatic"><xsl:apply-templates select="tei:choice/tei:orig"/></span>
</xsl:template>

And now it exports the HTML exactly as needed. No other adjustments to the XSL file. I don't understand why this makes a difference, but it's a big difference.

New HTML:

 <span class="interpretive">Arnaldum Prader</span><span 
 class="diplomatic">ar. p<sup class="subsup">a</sup>der</span> et 
 socium eius hereticos et vidit ibi cum eis <span 
 class="interpretive">Poncium Barrau</span><span class="diplomatic">P. 
 barrau</span>, <span class="interpretive">Iordanetum del Mas</span>
 <span class="diplomatic">Iordanetū<span class="line_num diplomatic">
 <span class="interpretive"> </span>del maſ</span>, <span 
 class="interpretive">Iordanum de Quiders</span><span 
 class="diplomatic">Iordanū de quiders</span> et <span 
 class="interpretive">Willelmum Vitalis</span><span 
 class="diplomatic">W. Vitał</span> predictum et <span 
 class="interpretive">uxor dicti Willelmi Vitalis</span><span 
 class="diplomatic">ux̄ dc̄ī W. Vitał</span>.

Upvotes: 0

Related Questions