Reputation: 51
I'm trying to write an HTML to BBCode converter, but being a complete newb in XSL I need help breaking the ice. Here's what I've got so far:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" encoding="UTF-8">
<xsl:output method="text" omit-xml-declaration="yes" indent="no" encoding="UTF-8"/>
<xsl:template match="h1|h2|h3|h4">[h]<xsl:apply-templates/>[/h]</xsl:template>
<xsl:template match="b|strong">[b]<xsl:apply-templates/>[/b]</xsl:template>
<xsl:template match="i|em">[i]<xsl:apply-templates/>[/i]</xsl:template>
<xsl:template match="u">[u]<xsl:apply-templates/>[/u]</xsl:template>
<xsl:template match="br"> </xsl:template>
<xsl:template match="p"> <xsl:apply-templates/> </xsl:template>
<xsl:template match="img">[img]<xsl:value-of select="@src"/>[/img]</xsl:template>
<xsl:template match="a">[url="<xsl:value-of select="@href"/>"]<xsl:apply-templates/>[/url]</xsl:template>
<xsl:template match="style|script"></xsl:template>
</xsl:stylesheet>
How would you match <a>
that have a specific keyword in href
and remove those nodes, while keeping others? And then check if those <a>
are empty or not, thus deciding whether to use [url]http://foo[/url]
or [url="http://foo"]bar[/url]
?
For example:
<a href="http://spammycrap.tld">Foo</a>
<a href="http://empty.tld"></a>
<a href="http://okay.tld">Baz</a>
Desired output:
[url]http://empty.tld[/url]
[url="http://okay.tld"]Baz[/url]
Upvotes: 2
Views: 357
Reputation: 243459
This transformation:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="a[starts-with(@href, 'http://spammy')]"/>
<xsl:template match="a[not(*|text()[normalize-space(.)])]">
<xsl:text>[url]</xsl:text>
<xsl:value-of select="@href"/>
<xsl:text>[/url]
</xsl:text>
</xsl:template>
<xsl:template match="a">
<xsl:text>[url="</xsl:text>
<xsl:value-of select="@href"/>"]<xsl:text/>
<xsl:value-of select="."/>
<xsl:text>[/url]
</xsl:text>
</xsl:template>
</xsl:stylesheet>
When applied on this XML document:
<html>
<a href="http://spammycrap.tld">Foo</a>
<a href="http://empty.tld"></a>
<a href="http://empty2.tld"> </a>
<a href="http://okay.tld">Baz</a>
</html>
produces the wanted, correct result:
[url]http://empty.tld[/url]
[url]http://empty2.tld[/url]
[url="http://okay.tld"]Baz[/url]
Upvotes: 0
Reputation:
To remove anchors that have an undesired string in their href
attribute, expand your match
XPath expression:
<xsl:template match="a[not(contains(@href,'Foo'))]">...
Foo
could be spammycrap.com
or whatever.
Additionally, you can specify different templates for empty and non empty anchors. So for non empty anchors, you would use:
<xsl:template match="a[not(contains(@href,'Foo')) and not(count(node()) = 0)]">...
followed by the template for non empty anchors. Then for empty anchors:
<xsl:template match="a[not(contains(@href,'Foo')) and not(node())]">...
followed by the template for empty anchors.
Overall, this becomes:
<xsl:template match="a[not(contains(@href,'Foo')) and not(count(node()) = 0)]">[url="<xsl:value-of select="@href"/>"]<xsl:apply-templates/>[/url]</xsl:template>
<xsl:template match="a[not(contains(@href,'Foo')) and not(node())]">[url]<xsl:value-of select="@href"/>[/url]</xsl:template>
Upvotes: 2
Reputation: 122364
You can ignore particular elements using an empty template e.g.
<xsl:template match="a[contains(@href, 'badurl')]" />
To find non-empty a
elements you could use
<xsl:template match="a[*|text()[normalize-space(.)]]">
<xsl:text>[url="</xsl:text>
<xsl:value-of select="@href"/>
<xsl:text>"]</xsl:text>
<xsl:apply-templates/>
<xsl:text>[/url]</xsl:text>
</xsl:template>
which matches any anchor that has child elements or text nodes that are not entirely whitespace. Anchors that do not match this pattern will be picked up by a generic match="a"
template
<xsl:template match="a">[url]<xsl:value-of select="@href" />[/url]</xsl:template>
Upvotes: 1