thugsb
thugsb

Reputation: 23436

Remove unwanted tags with XSL

I've got some unknown content coming in as a description, maybe something like this:

<description>
  <p>
    <span>
      <font>Hello</font>
    </span>
    World! 
    <a href="/index">Home</a>
  </p>
</description>

There could conceivable be any HTML tag. I don't want all the tags. The tags I want to allow are p, i, em, strong, b, ol, ul, li and a. So, for example, <font> would be stripped, but <p> and <a> would remain. I'm assuming I have to match the ones I want (and make sure there's nothing to match the others), but can't work out how to do it.

Any help?

Upvotes: 4

Views: 4934

Answers (1)

Wayne
Wayne

Reputation: 60424

Whitelist those elements:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="*[not(self::description or self::p or self::i or 
                               self::em or self::strong or self::b or 
                               self::ol or self::ul or self::li or self::a)]"/>
</xsl:stylesheet>

Note that this removes the undesired elements and anything below them. To just strip the font element itself, for example, but allow its children, modify the last template like this:

<xsl:template match="*[not(self::description or self::p or self::i or 
                           self::em or self::strong or self::b or 
                           self::ol or self::ul or self::li or self::a)]"/>
    <xsl:apply-templates/>
</xsl:template>

An equivalent (and slightly cleaner) solution:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:template match="@*|node()" priority="-3">
        <xsl:copy/>
    </xsl:template>
    <xsl:template match="description|p|i|em|strong|b|ol|ul|li|a">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="*"/>
</xsl:stylesheet>

The opposite approach is to blacklist the unwanted elements:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="font|span"/>
</xsl:stylesheet>

Again, add an apply-templates to the final template if you want to allow children of the skipped elements.

Upvotes: 8

Related Questions