Reputation: 3446
If the contents of a citations
node is something like the following:
<p>
WAJWAJADS:
</p>
<p>
asdf
</p>
<p>
ALSOAS:
</p>
<p>
lorem ipsum...<br />
lorem<br />
blah blah <i>
adfas & dasdsaafs
</i>, April 2011.<br />
lorem lorem dear lord the whitespace
</p>
Is there any way to transform this to properly formatted HTML with XSLT?
normalize-space()
just concats everything together. The best I've managed to do is normalize-space()
on all p
descendants within a for-each
loop and wrap them in a p
element. However, then any inner tags are still lost.
Is there a better way to parse this WYSIWYG generated trainwreck? Unfortunately I have no control over the generated XML.
Upvotes: 3
Views: 1643
Reputation: 117165
This question would have been a lot easier to understand if the example contained real text instead of gibberish. "No additional whitespace between node start/end and text." is not an accurate enough description of the expected result.
I am going to take a guess here and assume you actually want to perform a "run of spaces to one space" operation on all the text nodes. This could be done as follows:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()" priority="1">
<xsl:variable name="temp" select="normalize-space(concat('x', ., 'x'))" />
<xsl:value-of select="substring($temp, 2, string-length($temp) - 2)"/>
</xsl:template>
</xsl:stylesheet>
When applied to the following test input:
<chapter>
<p>
This question would have
been a lot <b> easier </b> to understand
if the example contained
<i> real </i> text instead of
gibberish.
</p>
<p>
Here is an example of preserving zero spaces
between text nodes:<br/>(continued) on a new line.
</p>
<p>
Here is another example of
preserving zero spaces within a text
node: <i>some text in italic</i> followed
by normal text.
</p>
</chapter>
the result will be:
<?xml version="1.0" encoding="UTF-8"?>
<chapter>
<p> This question would have been a lot <b> easier </b> to understand if the example contained <i> real </i> text instead of gibberish. </p>
<p> Here is an example of preserving zero spaces between text nodes:<br/>(continued) on a new line. </p>
<p> Here is another example of preserving zero spaces within a text node: <i>some text in italic</i> followed by normal text. </p>
</chapter>
--
Note that there will be no difference between the input and output when rendered in HTML.
Upvotes: 0
Reputation: 23627
You first need to have a well-formed XML with a root.
Assuming you have that, you can apply an identity transform to copy the source tree to the result, strip spaces between the tags, optionally generate output in HTML (without the XML declaration) and indented, and use normalize-space()
only in the text nodes.
Try this stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes" method="html"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
</xsl:stylesheet>
The result applied to the data you provided will be:
<p>WAJWAJADS:</p>
<p>asdf</p>
<p>ALSOAS:</p>
<p>lorem ipsum...<br>lorem<br>blah blah<i>adfas & dasdsaafs</i>, April 2011.<br>lorem lorem dear lord the whitespace
</p>
You can see the result applied to your example in this XSLT Fiddle
UPDATE 1: to add an extra space around each text node (and avoid concatenation when the string value of the node is calculated) you can replace the last template with:
<xsl:template match="text()">
<xsl:value-of select="concat(' ',normalize-space(.),' ')"/>
</xsl:template>
Result:
<html>
<p> WAJWAJADS: </p>
<p> asdf </p>
<p> ALSOAS: </p>
<p> lorem ipsum... <br> lorem <br> blah blah <i> adfas & dasdsaafs </i> , April 2011. <br> lorem lorem dear lord the whitespace
</p>
</html>
See: http://xsltransform.net/3NzcBsE/1
UPDATE 2: to add a space or newline after each copied element. Place this <xsl:text>
</xsl:text>
(for a newline) or this <xsl:text> </xsl:text>
(for a space) after the </xsl:copy>
in the first template:
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
<xsl:text>
</xsl:text>
</xsl:template>
Result:
<html>
<p>WAJWAJADS:</p>
<p>asdf</p>
<p>ALSOAS:</p>
<p>lorem ipsum...<br>
lorem<br>
blah blah<i>adfas & dasdsaafs</i>
, April 2011.<br>
lorem lorem dear lord the whitespace
</p>
</html>
See: http://xsltransform.net/3NzcBsE/2
Upvotes: 4
Reputation: 7173
I've modified a little the answer by Martin Honnen:
<xsl:template match="text()">
<xsl:value-of select="normalize-space(.)"/>
<xsl:if test="substring(., string-length(.)) = ' ' and substring(., string-length(.) - 1, string-length(.)) != ' '">
<xsl:text> </xsl:text>
</xsl:if>
</xsl:template>
it tests if the last character is a space and the last 2 characters are not both spaces, if true, it inserts a space.
Upvotes: 4
Reputation: 167716
Use the identity transformation template plus a template for text nodes doing the normalize-space:
<xsl:template match="text()"><xsl:value-of select="normalize-space()"/></xsl:template>
Upvotes: 1