Reputation: 815
I am outputting an XML file to HTML using XSLT 3.0 and I'm having a trouble eliminating whitespace before commas and periods. Below is an example of the precise problem I am having: the XML has line/carriage returns in it which are being reproduced in the HTML. Ordinarily this isn't a problem as the browser collapses the white-spaces to one white space; however as you can see in the example below it's preserving a white space before commas and periods.
(Note about XML: this is a text-encoding of a medieval manuscript, and therefore can have various elements within it, and it can appear nested within other elements at various levels).
XML:
<persName>
<choice>
<orig>ar. p<hi rend="sup">a</hi>der</orig>
<reg>Arnaldum Prader</reg>
</choice>
</persName> et socium eius hereticos et vidit ibi cum eis <persName>
<choice>
<orig>P. barrau</orig>
<reg>Poncium Barrau</reg>
</choice>
</persName>, <persName>
<choice>
<orig>Iordanetū del maſ</orig>
<reg>Iordanetum del Mas</reg>
</choice>
</persName>, <persName>
<choice>
<orig>Iordanū de quiders</orig>
<reg>Iordanum de Quiders</reg>
</choice>
</persName> et <persName>
<choice>
<orig>W. Vitał</orig>
<reg>Willelmum Vitalis</reg>
</choice>
</persName> predictum et <persName>
<choice>
<orig>ux̄ dc̄ī W. Vitał</orig>
<reg>uxor dicti Willelmi Vitalis</reg>
</choice>
</persName>.
XML templates:
<!-- format super/sub scripts -->
<xsl:template match="tei:hi" name="template_supersub">
<xsl:choose>
<xsl:when test="@rend ='sup'"><sup class="subsup"><xsl:apply-templates/></sup></xsl:when>
<xsl:when test="@rend ='sub'"><sub class="subsup"><xsl:apply-templates/></sub></xsl:when>
</xsl:choose>
</xsl:template>
<!-- parse persName into <spans> -->
<xsl:template match="tei:persName/tei:choice/tei:reg">
<span class="interpretive"><xsl:apply-templates/></span>
</xsl:template>
<xsl:template match="tei:persName/tei:choice/tei:orig">
<span class="diplomatic"><xsl:apply-templates/></span>
</xsl:template>
Current HTML output:
<span class="diplomatic">ar. p<sup class="subsup">a</sup>der</span>
<span class="interpretive">Arnaldum Prader</span>
et socium eius hereticos et vidit ibi cum eis
<span class="diplomatic">P. barrau</span>
<span class="interpretive">Poncium Barrau</span>
,
<span class="diplomatic">Iordanetū del maſ</span>
<span class="interpretive">Iordanetum del Mas</span>
,
<span class="diplomatic">Iordanū de quiders</span>
<span class="interpretive">Iordanum de Quiders</span>
et
<span class="diplomatic">W. Vitał</span>
<span class="interpretive">Willelmum Vitalis</span>
predictum et
<span class="diplomatic">ux̄ dc̄ī W. Vitał</span>
<span class="interpretive">uxor dicti Willelmi Vitalis</span>
.
Final, problematic output:
Arnaldum Prader et socium eius hereticos et vidit ibi cum eis Poncium Barrau , Iordanetum del Mas , Iordanum de Quiders et Willelmum Vitalis predictum et uxor dicti Willelmi Vitalis .
Various combinations of strip-space, replace(), translate() have not targeted this problem. They usually result in collapsing EVERY white space between elements.
What I would ideally like is no space before commas and periods, and one space after a comma or period. But I can't find a mechanism, let alone a hack, to address this. Thanks.
Desired HTML output:
<span class="diplomatic">ar. p<sup class="subsup">a</sup>der</span>
<span class="interpretive">Arnaldum Prader</span> et socium eius
hereticos et vidit ibi cum eis <span class="diplomatic">P.
barrau</span><span class="interpretive">Poncium Barrau</span>, <span
class="diplomatic">Iordanetū del maſ</span><span
class="interpretive">Iordanetum del Mas</span>, <span
class="diplomatic">Iordanū de quiders</span><span
class="interpretive">Iordanum de Quiders</span> et <span
class="diplomatic">W. Vitał</span><span class="interpretive">Willelmum
Vitalis</span> predictum et <span class="diplomatic">ux̄ dc̄ī W.
Vitał</span><span class="interpretive">uxor dicti Willelmi
Vitalis</span>.
Upvotes: 3
Views: 173
Reputation: 1458
In your answer to your own post you wrote that you "don't understand why this makes a difference". Let me try to help: You need to avoid all whitespace child nodes within choice
and persName[choice]
from being parsed, literally the spaces between <choice> and <orig>, for example. These are not part of your content but only of TEI structure, and have to be ignored. This is an issue which will recur often and on different levels when you work with TEI.
These templates here should demonstrate how to cover this problem in a more "understanding" way. Instead of applying all templates (and thus including text nodes), you can explicitly name only the elements desired for your output.
<xsl:template match="tei:choice">
<xsl:apply-templates select="tei:reg"/>
<xsl:apply-templates select="tei:orig"/>
</xsl:template>
<xsl:template match="tei:persName[tei:choice]">
<xsl:apply-templates select="tei:choice"/>
</xsl:template>
Final remark: Be aware of your schema. If persName
is allowed to contain non-whitespace text outside of choice
(and it usually is), you should treat this differently. The solution here works only if persName
always contains choice
with reg
and orig
.
Upvotes: 1
Reputation: 815
Posting a response to my own question in order to avoid a really long complicated post.
I adjusted this XSL:
<!-- parse persName into <spans> -->
<xsl:template match="tei:persName/tei:choice/tei:reg">
<span class="interpretive"><xsl:apply-templates/></span>
</xsl:template>
<xsl:template match="tei:persName/tei:choice/tei:orig">
<span class="diplomatic"><xsl:apply-templates/></span>
</xsl:template>
To this XSL:
<!-- parse persName into <spans> -->
<xsl:template match="tei:persName">
<span class="interpretive"><xsl:apply-templates select="tei:choice/tei:reg"/></span><span class="diplomatic"><xsl:apply-templates select="tei:choice/tei:orig"/></span>
</xsl:template>
And now it exports the HTML exactly as needed. No other adjustments to the XSL file. I don't understand why this makes a difference, but it's a big difference.
New HTML:
<span class="interpretive">Arnaldum Prader</span><span
class="diplomatic">ar. p<sup class="subsup">a</sup>der</span> et
socium eius hereticos et vidit ibi cum eis <span
class="interpretive">Poncium Barrau</span><span class="diplomatic">P.
barrau</span>, <span class="interpretive">Iordanetum del Mas</span>
<span class="diplomatic">Iordanetū<span class="line_num diplomatic">
<span class="interpretive"> </span>del maſ</span>, <span
class="interpretive">Iordanum de Quiders</span><span
class="diplomatic">Iordanū de quiders</span> et <span
class="interpretive">Willelmum Vitalis</span><span
class="diplomatic">W. Vitał</span> predictum et <span
class="interpretive">uxor dicti Willelmi Vitalis</span><span
class="diplomatic">ux̄ dc̄ī W. Vitał</span>.
Upvotes: 0