Reputation: 11
Please, I'm trying to extract "plain text" from "annotated text" (or plain content from complex content).
This is the input XML I have:
<l>string</l>
<l>string<g><b/>string2</g></l>
<l>string<g><b/>string2</b>string3</g></l>
<l>string<b/>string2<b/>string3</l>
and this is the output I need:
<word>string</word>
<word>string1 string2</word>
<word>string1 string2 string3</word>
<word>string1 string2 string3</word>
Essentially: (i) I do not need the element and (ii) replace empty elements by blank spaces
Many thanks!
Upvotes: 1
Views: 198
Reputation: 70608
You could achieve this by making use of the identity transform, but overridding it with your special cases, like so:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no"/>
<!-- Replace elements under root element with word element -->
<xsl:template match="/*/*">
<word>
<xsl:apply-templates select="node()"/>
</word>
</xsl:template>
<!-- Match, but don't copy, elements -->
<xsl:template match="@*|node()">
<xsl:apply-templates select="@*|node()"/>
</xsl:template>
<!-- Copy out text nodes -->
<xsl:template match="text()">
<xsl:copy/>
</xsl:template>
<!-- Replace empty element by space -->
<xsl:template match="*[not(node())]">
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
When applied on the following XML
<data>
<l>string</l>
<l>string<g><b/>string2</g></l>
<l>string<g><b/>string2<b/>string3</g></l>
<l>string<b/>string2<b/>string3</l>
</data>
The output is as follows:
<word>string</word>
<word>string string2</word>
<word>string string2 string3</word>
<word>string string2 string3</word>
Upvotes: 2