tenub
tenub

Reputation: 3446

Transforming node contents to remove whitespace

If the contents of a citations node is something like the following:

                <p>

            WAJWAJADS:

            </p>

<p>

            asdf

            </p>

<p>

            ALSOAS:

            </p>

<p>

            lorem ipsum...<br />
lorem<br />
blah blah <i>

            adfas &amp; dasdsaafs

            </i>, April 2011.<br />
lorem lorem dear lord the whitespace

            </p>

Is there any way to transform this to properly formatted HTML with XSLT?

normalize-space() just concats everything together. The best I've managed to do is normalize-space() on all p descendants within a for-each loop and wrap them in a p element. However, then any inner tags are still lost.

Is there a better way to parse this WYSIWYG generated trainwreck? Unfortunately I have no control over the generated XML.

Upvotes: 3

Views: 1643

Answers (4)

michael.hor257k
michael.hor257k

Reputation: 117165

This question would have been a lot easier to understand if the example contained real text instead of gibberish. "No additional whitespace between node start/end and text." is not an accurate enough description of the expected result.

I am going to take a guess here and assume you actually want to perform a "run of spaces to one space" operation on all the text nodes. This could be done as follows:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="text()" priority="1">
    <xsl:variable name="temp" select="normalize-space(concat('x', ., 'x'))" />
    <xsl:value-of select="substring($temp, 2, string-length($temp) - 2)"/>
</xsl:template>

</xsl:stylesheet>

When applied to the following test input:

<chapter>


           <p>

    This         question          would         have

been       a     lot    <b>   easier      </b>      to understand 

        if     the      example   contained     

   <i>     real  </i>    text    instead   of 

   gibberish.

                     </p>


    <p>

    Here     is       an      example       of     preserving   zero     spaces 

    between    text   nodes:<br/>(continued)       on   a new   line. 




    </p>


        <p>

    Here  is       another      example       of     

    preserving   zero     spaces     within    a      text

    node:     <i>some     text  in      italic</i>       followed    

    by   normal      text. 


    </p>


</chapter>

the result will be:

<?xml version="1.0" encoding="UTF-8"?>
<chapter>
   <p> This question would have been a lot <b> easier </b> to understand if the example contained <i> real </i> text instead of gibberish. </p>
   <p> Here is an example of preserving zero spaces between text nodes:<br/>(continued) on a new line. </p>
   <p> Here is another example of preserving zero spaces within a text node: <i>some text in italic</i> followed by normal text. </p>
</chapter>

--
Note that there will be no difference between the input and output when rendered in HTML.

Upvotes: 0

helderdarocha
helderdarocha

Reputation: 23627

You first need to have a well-formed XML with a root.

Assuming you have that, you can apply an identity transform to copy the source tree to the result, strip spaces between the tags, optionally generate output in HTML (without the XML declaration) and indented, and use normalize-space() only in the text nodes.

Try this stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:strip-space elements="*"/>
    <xsl:output indent="yes" method="html"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="text()">
         <xsl:value-of select="normalize-space(.)"/>
    </xsl:template>

</xsl:stylesheet>

The result applied to the data you provided will be:

<p>WAJWAJADS:</p>
<p>asdf</p>
<p>ALSOAS:</p>
<p>lorem ipsum...<br>lorem<br>blah blah<i>adfas &amp; dasdsaafs</i>, April 2011.<br>lorem lorem dear lord the whitespace
</p>

You can see the result applied to your example in this XSLT Fiddle

UPDATE 1: to add an extra space around each text node (and avoid concatenation when the string value of the node is calculated) you can replace the last template with:

<xsl:template match="text()">
    <xsl:value-of select="concat(' ',normalize-space(.),' ')"/>
</xsl:template>

Result:

<html>
   <p> WAJWAJADS: </p>
   <p> asdf </p>
   <p> ALSOAS: </p>
   <p> lorem ipsum... <br> lorem <br> blah blah <i> adfas &amp; dasdsaafs </i> , April 2011. <br> lorem lorem dear lord the whitespace 
   </p>
</html>

See: http://xsltransform.net/3NzcBsE/1

UPDATE 2: to add a space or newline after each copied element. Place this <xsl:text>&#xa;</xsl:text> (for a newline) or this <xsl:text> </xsl:text> (for a space) after the </xsl:copy> in the first template:

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
    <xsl:text>&#xa;</xsl:text>
</xsl:template>

Result:

<html>
   <p>WAJWAJADS:</p>

   <p>asdf</p>

   <p>ALSOAS:</p>

   <p>lorem ipsum...<br>
      lorem<br>
      blah blah<i>adfas &amp; dasdsaafs</i>
      , April 2011.<br>
      lorem lorem dear lord the whitespace
   </p>

</html>

See: http://xsltransform.net/3NzcBsE/2

Upvotes: 4

Joel M. Lamsen
Joel M. Lamsen

Reputation: 7173

I've modified a little the answer by Martin Honnen:

<xsl:template match="text()">
    <xsl:value-of select="normalize-space(.)"/>
    <xsl:if test="substring(., string-length(.)) = ' ' and substring(., string-length(.) - 1, string-length(.)) != '  '">
        <xsl:text> </xsl:text>
    </xsl:if>
</xsl:template>

it tests if the last character is a space and the last 2 characters are not both spaces, if true, it inserts a space.

Upvotes: 4

Martin Honnen
Martin Honnen

Reputation: 167716

Use the identity transformation template plus a template for text nodes doing the normalize-space:

<xsl:template match="text()"><xsl:value-of select="normalize-space()"/></xsl:template>

Upvotes: 1

Related Questions