Michael Bailey
Michael Bailey

Reputation: 31

XSLT transform removes HTML elements from mixed-content

Is it possible for XSLT preserve anchors and other embedded HTML tags within XML?

Background: I am trying to convert an HTML document into XML with an XSL stylesheet using XSLT. The original HTML document had content interspersed with anchor tags (e.g. Some hyperlinks here and there). I've copied that content into my XML, but the XSLT output lacks anchor tags.

Example XML:

<?xml version="1.0" ?>
<observations>
  <observation><a href="http://jwz.org">Hyperlinks</a> disappear.</observation>
</observations>

Example XSL:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/html">

  <xsl:output method="html" indent="yes" encoding="UTF-8"/>

  <xsl:template match="/observations">
  <html>
    <body>
      <xsl:value-of select="observation"/>
    </body>
  </html>
  </xsl:template>

</xsl:stylesheet>

Output:

<html xmlns="http://www.w3.org/1999/html">
<body>Hyperlinks disappear.</body>
</html>

I've read a few similar articles on stackoverflow and checked out the Identity transform page on wikipedia; I started to get some interesting results using xsl:copy-of, but I don't understand enough about XSLT to get all of the words and tags embedded within each XML element to appear in the resulting HTML. Any help would be appreciated.

Upvotes: 3

Views: 1423

Answers (1)

Mathias M&#252;ller
Mathias M&#252;ller

Reputation: 22617

Write a separate template to match a elements, copy their attributes and content.

What is wrong with your approach? In your code,

<xsl:value-of select="observation"/>

simply sends to the output the string value of the observation element. Its string value is the concatenation of all text nodes it contains. But you need not only the text nodes in it, but also the a elements themselves.

The default behaviour of an XSLT processor is to "skip" element nodes, because of a built-in template. So, if you do not mention a in a template match, it is simply ignored and only its text content is output.

Stylesheet

Note: This stylesheet still relies on the default behaviour of the XSLT processor to some extent. The order of events will resemble the following:

The template where match="/observations" is matched. It adds html and body to the output. Then, a template rule must be found for the content of observations. A built-in template matches observation, does nothing with it, and looks for a template to process its content. For the a element, the corresponding template is matched, with copies the element and attributes. Finally, a built-in template copies the text nodes inside observation and a.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="html" indent="yes" encoding="UTF-8"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="/observations">
  <html>
    <body>
      <xsl:apply-templates/>
    </body>
  </html>
  </xsl:template>

  <xsl:template match="a">
      <xsl:copy>
        <xsl:copy-of select="@*"/>
          <xsl:apply-templates/>
      </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

XML Output

<html>
   <body><a href="http://jwz.org">Hyperlinks</a> disappear.
   </body>
</html>

Upvotes: 2

Related Questions