Ismar Slomic
Ismar Slomic

Reputation: 5514

XSLT Identity Transformation without change to the output

Is it possible to do xslt identity transformation where absolutly nothing is changed from the source?

When I use following template, ident and linebreaks are changed in the output and I don't want to do any changes to the source xml.

XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>  
</xsl:template>

INPUT

<S:Envelope
  xmlns:S="http://www.w3.org/2003/05/soap-envelope" 
  xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing"
  xmlns:f123="http://www.fabrikam123.example/svc53">
  <S:Header>
    <wsa:MessageID>
      uuid:aaaabbbb-cccc-dddd-eeee-wwwwwwwwwww
    </wsa:MessageID>
    <wsa:RelatesTo>
      uuid:aaaabbbb-cccc-dddd-eeee-ffffffffffff
    </wsa:RelatesTo>
    <wsa:To S:mustUnderstand="1">
      http://business456.example/client1
    </wsa:To>
    <wsa:Action>http://fabrikam123.example/mail/DeleteAck</wsa:Action>
  </S:Header>
  <S:Body>
    <f123:DeleteAck/>
  </S:Body>
</S:Envelope>

OUTPUT

<?xml version="1.0" encoding="UTF-8"?><S:Envelope xmlns:S="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing" xmlns:f123="http://www.fabrikam123.example/svc53">
  <S:Header>
    <wsa:MessageID>
      uuid:aaaabbbb-cccc-dddd-eeee-wwwwwwwwwww
    </wsa:MessageID>
    <wsa:RelatesTo>
      uuid:aaaabbbb-cccc-dddd-eeee-ffffffffffff
    </wsa:RelatesTo>
    <wsa:To S:mustUnderstand="1">
      http://business456.example/client1
    </wsa:To>
    <wsa:Action>http://fabrikam123.example/mail/DeleteAck</wsa:Action>
  </S:Header>
  <S:Body>
    <f123:DeleteAck/>
  </S:Body>
</S:Envelope>

Upvotes: 3

Views: 2760

Answers (3)

Francis Avila
Francis Avila

Reputation: 31641

No, you cannot. The input and output XML will be the "same" in the sense that they produce the same XML Infoset, but they will not necessarily be byte-for-byte identical and this is not something that XSLT can control.

Why do you need this? If you are trying to compare XML documents easily, consider using XML Canonicalization. Many XML libraries have a method of producing canonical XML, and the xmllint command line tool can produce it easily from files.

Upvotes: 3

Ian Roberts
Ian Roberts

Reputation: 122414

In general it's not possible to be 100% confident that you'll get exactly everything unchanged because the xslt data model simply doesn't preserve all the information from the parse. For example if the input contains &#x3C; then the output might contain &lt;. Similarly CDATA sections aren't preserved - adjacent text nodes (CDATA sections and normal text modes) are merged into one at parse time and while you can configure the processor to use CDATA for the content of certain elements you can't simply preserve them as they were.

There are other issues such as the fact that the data model doesn't distinguish between <foo></foo>, <foo/> and <foo /> - they all represent the same empty element and any of them from the input could be represented by any of them in the output. And as in your example white space between attributes within a start tag is not preserved.

But of course these differences are all things that an XML tool shouldn't care about as they're different ways to represent exactly the same infoset.

Upvotes: 1

C. M. Sperberg-McQueen
C. M. Sperberg-McQueen

Reputation: 25054

The default behavior of XSLT processors is to preserve whitespace in the input, and the behavior of the processors I've just tested is consistent with the spec.

But the whitespace in question is whitespace in the text nodes of the input.

The whitespace between attribute-value specifications in start-tags, and the whitespace between items (e.g. comments and processing instructions) in the prolog and epilog of the document are not text nodes, and are not affected by the preserve-space settings. That white space is also, in fact, not part of the XPath data model, so there is very little the processor can legitimately do to preserve it.

If the whitespace in question carries information, you will want to revisit the design of the vocabulary (it's really a bad idea for that whitespace to be significant); if it's just that you would prefer that there be newlines between attribute-value specifications, you may want to write a custom serializer to insert such newlines and indentation on output. (If your motive is to avoid confusing a diff program with whitespace differences, my experience is that your choices are to normalize whitespace before diffing or to get a diff program that's a bit more robust in the face of whitespace variation.) Good luck.

Upvotes: 1

Related Questions