Flag
Flag

Reputation: 577

xslt - copy template adds unexpected attributes

I have the following XML file:

<?xml version="1.0"?>
<!DOCTYPE reference PUBLIC "-//OASIS//DTD DITA Reference//EN" "dtd/reference.dtd">
<!--<!DOCTYPE reference PUBLIC "-//OASIS//DTD DITA Reference//EN" "dtd/reference.dtd">-->
<reference xml:lang="en-us" id="D609">
    <title>Body Text</title>
    <shortdesc>A short desc.</shortdesc>
    <prolog>
        <metadata/>
    </prolog>
    <refbody>
        <section/>
    </refbody>
</reference>

I just want to add some elements in it. So I simply run a copy template like that:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema">        


  <xsl:template match="/">

    <!-- Get the DOCTYPE comment -->
    <xsl:variable name="d"
      select="//comment()[contains(.,'DOCTYPE')][1]" />
    <xsl:variable name="doctype" select="substring($d,0)" />
    <xsl:message select="$doctype" />

    <!-- Output the DOCTYPE -->
    <xsl:value-of disable-output-escaping="yes" select="$doctype" />  

    <xsl:apply-templates />  
  </xsl:template>

  <xsl:template match="comment()[contains(text(),DOCTYPE)]">
  </xsl:template>  

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()" />
    </xsl:copy>
  </xsl:template>  
</xsl:stylesheet>

And instead of getting what I would expect to be the exact same ouput, I get that:

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE reference PUBLIC "-//OASIS//DTD DITA Reference//EN" "dtd/reference.dtd">
<reference xml:lang="en-us" id="D609" DTDVersion="V1.1.3"
    domains="(topic ui-d) (topic hi-d) (topic pr-d) (topic sw-d) (topic ut-d)"
    class="- topic/topic       reference/reference ">
    <title class="- topic/title ">Body Text</title>
    <shortdesc class="- topic/shortdesc ">A short desc.</shortdesc>
    <prolog class="- topic/prolog ">
        <metadata class="- topic/metadata "/>
    </prolog>
    <refbody class="- topic/body        reference/refbody ">
        <section class="- topic/section "/>
    </refbody>
</reference>

So basically, I get a class attribute for every element. My ´reference´ tag is also decorated with some tags created magically (to me).

Where do the attributes come from? How do I get rid of them?

I'm thinking it might be related to the DTD or doctype that I try to copy as well, but I'm not sure.

Upvotes: 0

Views: 198

Answers (1)

Michael Kay
Michael Kay

Reputation: 163625

The input to the XSLT processor comes from the XML parser, and the form of this input is (logically) a tree of nodes whose detailed form is defined by the XDM data model. If the XML parser is a validating parser (driven by element and attribute definitions in the DTD), then the tree passed by the parser to the XSLT engine will generally contain not only the attributes that were explicitly present in the source, but also those for which default values have been defined in the DTD. The XDM model does not distinguish explicit and implicit attributes, and XSLT therefore treats both in the same way.

Some XSLT processors (or XML parsers) may have options to exclude defaulted attributes from the XDM model of the input tree. With Saxon, for example, this can be achieved using the -expand:off when running from the command line, or similar options when running via a Java or .NET API. If there is no such option, then you best bet is probably to avoid validation of the input against a DTD; the details of how to do this depend on your XML parser / XSLT processor combination.

Upvotes: 2

Related Questions