wasmachien
wasmachien

Reputation: 1009

Preserving attribute new lines in XSLT

Input XML:

<element attr="a  b  
c

d"/>

XSL:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
  <xsl:mode on-no-match="shallow-copy"/>
  <xsl:output method="xml" indent="yes"/>
</xsl:stylesheet>

Result: <element attr="a b c d"/>

Using Saxon 9.9.

Why are the carriage returns stripped? Is there something in the XML spec that says they are not relevant in attributes? Are there any workarounds to keep them?

Upvotes: 2

Views: 316

Answers (1)

Daniel Haley
Daniel Haley

Reputation: 52878

I think what you're seeing is attribute value normalization. What happens is all of the newlines get normalized to #xA. Then all of those get normalized to a space (#x20).

This is based on these statements in the spec (linked above):

All line breaks must have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.

and

For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.

The only "workaround" I can think of is to preprocess the XML to replace the newlines with character references. This is based on:

Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9).

You'll still get character references if you output an attribute value, but you'll get actual newlines if you output the value in an element or as text...

Replaced newlines:

<element attr="a  b  &#xA;    c&#xA;    &#xA;    d"/>

XSLT

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:mode on-no-match="shallow-copy"/>
  <xsl:output method="xml" indent="yes"/>
  
  <xsl:template match="/*">
    <test attr="{@attr}">
      <xsl:value-of select="@attr"/>
    </test>
  </xsl:template>
  
</xsl:stylesheet>

Output

<test attr="a  b  &#xA;    c&#xA;    &#xA;    d">a  b  
    c
    
    d</test>

Upvotes: 4

Related Questions