Amberite
Amberite

Reputation: 1409

Double-unescaping raw HTML inside XSL?

I am working with an XML file that has raw HTML stored inside a node's attribute (<node data="HTML...">).

I just realized that the HTML is double-encoded, so that, instead of being:

&lt;div&gt;

It is actually written as:

&amp;lt;div&amp;gt;

This means that if I do something like:

<xsl:value-of select="node/@data" disable-output-escaping="yes" />

I will still get a (single) escaped value:

&lt;div&gt;

What's the easiest way of unescaping this once again?

Upvotes: 3

Views: 609

Answers (1)

Tomalak
Tomalak

Reputation: 338316

It's definitely not pretty, but basically you are looking at a limited number of string replace operations

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html" encoding="utf-8" />

  <xsl:variable name="ampDbl" select="'&amp;amp;'" />
  <xsl:variable name="amp" select="'&amp;'" />
  <xsl:variable name="ltDbl" select="'&amp;lt;'" />
  <xsl:variable name="lt" select="'&lt;'" />
  <xsl:variable name="gtDbl" select="'&amp;gt;'" />
  <xsl:variable name="gt" select="'&gt;'" />

  <xsl:template match="/">
    <xsl:apply-templates select="//@data" mode="unescape" />
  </xsl:template>

  <xsl:template match="@data" mode="unescape">
    <xsl:variable name="step1">
      <xsl:call-template name="StringReplace">
        <xsl:with-param name="s" select="string()" />
        <xsl:with-param name="search" select="$ltDbl" />
        <xsl:with-param name="replace" select="$lt" />
      </xsl:call-template>
    </xsl:variable>
    <xsl:variable name="step2">
      <xsl:call-template name="StringReplace">
        <xsl:with-param name="s" select="$step1" />
        <xsl:with-param name="search" select="$gtDbl" />
        <xsl:with-param name="replace" select="$gt" />
      </xsl:call-template>
    </xsl:variable>
    <xsl:variable name="step3">
      <xsl:call-template name="StringReplace">
        <xsl:with-param name="s" select="$step2" />
        <xsl:with-param name="search" select="$ampDbl" />
        <xsl:with-param name="replace" select="$amp" />
      </xsl:call-template>
    </xsl:variable>
    <xsl:value-of select="$step3" disable-output-escaping="yes" />
  </xsl:template>

  <!-- generic string replace template -->
  <xsl:template name="StringReplace">
    <xsl:param name="s"       select="''" />
    <xsl:param name="search"  select="''" />
    <xsl:param name="replace" select="''" />

    <xsl:choose>
      <xsl:when test="contains($s, $search)">
        <xsl:value-of select="substring-before($s, $search)" />
        <xsl:value-of select="$replace" />
        <xsl:variable name="rest" select="substring-after($s, $search)" />
        <xsl:if test="$rest">
          <xsl:call-template name="StringReplace">
            <xsl:with-param name="s"       select="$rest" />
            <xsl:with-param name="search"  select="$search" />
            <xsl:with-param name="replace" select="$replace" />
          </xsl:call-template>
        </xsl:if>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$s" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

When applied to

<root>
  <node data="&amp;lt;div&amp;gt;bla &amp;amp;amp; bla&amp;lt;/div&amp;gt;" />
</root>

gives (in source code)

<div>bla &amp; bla</div>

which of course becomes this on screen:

bla & bla

You might want to add a step4 for '&amp;quot;' to '&quot;'.

Upvotes: 2

Related Questions