Reputation: 1129
I've been struggling with what would be the best solution to get rid of some specific tags. Currently I use some repetitive find/replace with some regex but there's gotta be a better way for sure. Just not clear how to do it in xslt directly.
Take following example :
<local xml:lang="en">[Some Indicator]<div class="tab"/>some more content here</local>
I've got quite some of these, and all follow the same structure, where the [Some Indicator] is a kind of list identifier and can be any of the following :
I want to get rid of all of these without having to manually find / replace a few hundred times. I've been trying xsl:analyze-string but then it replaces everything without bothering position.
Some examples :
<some_nodes_above>
<local xml:lang="en">1<div class="tab"/>some more content here</local>
<local xml:lang="en">2.<div class="tab"/>some more content here</local>
<local xml:lang="fr">2-A<div class="tab"/>some more content here</local>
<local xml:lang="de"><div class="tab"/>some more content here</local>
</some_nodes_above>
should become :
<some_nodes_above>
<local xml:lang="en">some more content here</local>
<local xml:lang="en">some more content here</local>
<local xml:lang="fr">some more content here</local>
<local xml:lang="de">some more content here</local>
</some_nodes_above>
So I'm looking for a xslt(2) script that says something like 'Whenever you see a local node followed by a given indicator and a tab div, strip the indicator and the tab div'. Not looking for a full solution for the example, just something to put me in the right direction. If I know how it would work for one pattern I can probably figure out the remainder myself
Thanks in advance.
Upvotes: 3
Views: 227
Reputation: 243549
This transformation:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"local/node()[1]
[self::text()
and
following-sibling::node()[1]
[self::div and @class eq 'tab']
and
(
matches(., '^(\d\.?)|(.\-.)$')
or
string-length(.) eq 1
and
string-to-codepoints(.) ge 57600
and
string-to-codepoints(.) le 58607
)
]"/>
<xsl:template match=
"div[@class eq 'tab'
and
preceding-sibling::node()[1]
[self::text()
and
(
matches(., '^(\d\.?)|(.\-.)$')
or
string-length(.) eq 1
and
string-to-codepoints(.) ge 57600
and
string-to-codepoints(.) le 58607
)
]
]"/>
</xsl:stylesheet>
when applied on the provided XML document:
<some_nodes_above>
<local xml:lang="en"
>1<div class="tab"/>some more content here</local>
<local xml:lang="en"
>2.<div class="tab"/>some more content here</local>
<local xml:lang="fr"
>2-A<div class="tab"/>some more content here</local>
<local xml:lang="de"
><div class="tab"/>some more content here</local>
</some_nodes_above>
produces the wanted, correct result:
<some_nodes_above>
<local xml:lang="en">some more content here</local>
<local xml:lang="en">some more content here</local>
<local xml:lang="fr">some more content here</local>
<local xml:lang="de">some more content here</local>
</some_nodes_above>
Upvotes: 2
Reputation: 13450
replace (?<=<local xml:lang="\w+">).+<div class="tab"/>
with empty string
include regex option multylines
Upvotes: 2