Reputation: 75
I have two versions of a document encoded in one TEI XML and wish to output one of the versions to a text file. Here’s the sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Title</title>
</titleStmt>
<publicationStmt>
<p>Publication Information</p>
</publicationStmt>
<sourceDesc>
<p>Information about the source</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<p>John Q Doe was born in
<app>
<rdg wit="text1">Omaha</rdg>
<rdg wit="text2">Lincoln</rdg>
</app>
in 1950. But was he
<app>
<rdg wit="text1">happy</rdg>
<rdg wit="text2">glad</rdg>
</app>?
Some say no.
</p>
</body>
</text>
</TEI>
And here’s the sample XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs tei" version="2.0">
<xsl:output omit-xml-declaration="yes" method="text" encoding="UTF-8"/>
<xsl:template match="text()">
<xsl:value-of select='normalize-space()'/>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates></xsl:apply-templates>
</xsl:template>
<xsl:template match="tei:teiHeader">
</xsl:template>
<xsl:template match="tei:app">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="tei:rdg[@wit='text1']">
<xsl:apply-templates/>
</xsl:template>
<!-- Cancel out the alternate version of the text-->
<xsl:template match="tei:rdg[@wit='text2']">
</xsl:template>
</xsl:stylesheet>
What I want to output is “John Q Doe was born in Omaha in 1950. But was he happy? Some say no.” What I end up with is “John Q Doe was born inOmahain 1950. But was hehappy? Some say no.” So, I somehow need to preserve a single space around the app elements. I can’t preserve-space()
because I use extra whitespace for readability, and I can’t simply use <xsl:text>
to insert spaces on the template match for tei:app
, because sometimes punctuation comes immediately after the <app>
element, as does the question mark above. I’m stumped.
Upvotes: 2
Views: 1445
Reputation: 122414
It looks like you essentially want a special case of normalize-space()
that just does the normalizing of runs of whitespace (including at the start and end of the string) down to a single space, without also stripping leading and trailing whitespace. Since you're in XSLT 2.0 you can do that with a simple regular expression:
<xsl:template match="text()">
<xsl:value-of select="replace(., '\s+', ' ')"/>
</xsl:template>
You would also need to add
<xsl:strip-space elements="*"/>
to the top of the stylesheet in order to suppress text nodes that are entirely whitespace. Without that you'll end up with an extra space in your output for each all-space text node (e.g. between <text>
and <body>
, <body>
and <p>
, </rdg>
and </app>
, etc.). The strip-space
directive only affects all-whitespace text nodes, it does not affect the whitespace within text nodes that also contain useful non-space content.
Upvotes: 1
Reputation: 4749
You need to add the missing spaces.
If you put a <xsl:text> </xsl:text>
you will get a space before and after the elements:
<xsl:template match="tei:rdg[@wit='text1']">
<xsl:text> </xsl:text>
<xsl:apply-templates/>
<xsl:text> </xsl:text>
</xsl:template>
This gives following output:
John Q Doe was born in Omaha in 1950. But was he happy ? Some say no
Upvotes: 0