staio
staio

Reputation: 51

How to find <a> elements with specific keywords in href using XSL?

I'm trying to write an HTML to BBCode converter, but being a complete newb in XSL I need help breaking the ice. Here's what I've got so far:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" encoding="UTF-8">
<xsl:output method="text" omit-xml-declaration="yes" indent="no" encoding="UTF-8"/>

<xsl:template match="h1|h2|h3|h4">[h]<xsl:apply-templates/>[/h]</xsl:template>
<xsl:template match="b|strong">[b]<xsl:apply-templates/>[/b]</xsl:template>
<xsl:template match="i|em">[i]<xsl:apply-templates/>[/i]</xsl:template>
<xsl:template match="u">[u]<xsl:apply-templates/>[/u]</xsl:template>
<xsl:template match="br">&#10;</xsl:template>
<xsl:template match="p">&#10;<xsl:apply-templates/>&#10;&#10;</xsl:template>
<xsl:template match="img">[img]<xsl:value-of select="@src"/>[/img]</xsl:template>
<xsl:template match="a">[url="<xsl:value-of select="@href"/>"]<xsl:apply-templates/>[/url]</xsl:template>

<xsl:template match="style|script"></xsl:template>

</xsl:stylesheet>

How would you match <a> that have a specific keyword in href and remove those nodes, while keeping others? And then check if those <a> are empty or not, thus deciding whether to use [url]http://foo[/url] or [url="http://foo"]bar[/url]?

For example:

<a href="http://spammycrap.tld">Foo</a>
<a href="http://empty.tld"></a>
<a href="http://okay.tld">Baz</a>

Desired output:

[url]http://empty.tld[/url]
[url="http://okay.tld"]Baz[/url]

Upvotes: 2

Views: 357

Answers (3)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243459

This transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="a[starts-with(@href, 'http://spammy')]"/>

 <xsl:template match="a[not(*|text()[normalize-space(.)])]">
  <xsl:text>[url]</xsl:text>
    <xsl:value-of select="@href"/>
  <xsl:text>[/url]&#xA;</xsl:text>
 </xsl:template>

 <xsl:template match="a">
  <xsl:text>[url="</xsl:text>
  <xsl:value-of select="@href"/>"]<xsl:text/>
  <xsl:value-of select="."/>
  <xsl:text>[/url]&#xA;</xsl:text>
 </xsl:template>
</xsl:stylesheet>

When applied on this XML document:

<html>
    <a href="http://spammycrap.tld">Foo</a>
    <a href="http://empty.tld"></a>
    <a href="http://empty2.tld">    </a>
    <a href="http://okay.tld">Baz</a>
</html>

produces the wanted, correct result:

[url]http://empty.tld[/url]
[url]http://empty2.tld[/url]
[url="http://okay.tld"]Baz[/url]

Upvotes: 0

user1726343
user1726343

Reputation:

To remove anchors that have an undesired string in their href attribute, expand your match XPath expression:

<xsl:template match="a[not(contains(@href,'Foo'))]">...

Foo could be spammycrap.com or whatever.

Additionally, you can specify different templates for empty and non empty anchors. So for non empty anchors, you would use:

<xsl:template match="a[not(contains(@href,'Foo')) and not(count(node()) = 0)]">...

followed by the template for non empty anchors. Then for empty anchors:

<xsl:template match="a[not(contains(@href,'Foo')) and not(node())]">...

followed by the template for empty anchors.

Overall, this becomes:

<xsl:template match="a[not(contains(@href,'Foo')) and not(count(node()) = 0)]">[url="<xsl:value-of select="@href"/>"]<xsl:apply-templates/>[/url]</xsl:template>

<xsl:template match="a[not(contains(@href,'Foo')) and not(node())]">[url]<xsl:value-of select="@href"/>[/url]</xsl:template>

Upvotes: 2

Ian Roberts
Ian Roberts

Reputation: 122364

You can ignore particular elements using an empty template e.g.

<xsl:template match="a[contains(@href, 'badurl')]" />

To find non-empty a elements you could use

<xsl:template match="a[*|text()[normalize-space(.)]]">
  <xsl:text>[url="</xsl:text>
  <xsl:value-of select="@href"/>
  <xsl:text>"]</xsl:text>
  <xsl:apply-templates/>
  <xsl:text>[/url]</xsl:text>
</xsl:template>

which matches any anchor that has child elements or text nodes that are not entirely whitespace. Anchors that do not match this pattern will be picked up by a generic match="a" template

<xsl:template match="a">[url]<xsl:value-of select="@href" />[/url]</xsl:template>

Upvotes: 1

Related Questions