Reputation: 7521
I'm trying to extract the text of an interesting node (here big-structured-text
) but within this node there are some children I would like to skip (here title
, subtitle
, and code
). Those "to remove" nodes can have children.
Sample data:
<root>
<big-structured-text>
<section>
<title>Introduction</title>
In this part we describe Australian foreign policy....
<subsection>
<subtitle>Historical context</subtitle>
After its independence...
<meta>
<keyword>foreign policy</keyword>
<keyword>australia</keyword>
<code>
<value>XXHY-123</value>
<label>IRRN</label>
</code>
</meta>
</subsection>
</section>
</big-structured-text>
<!-- ... -->
<big-structured-text>
<!-- ... -->
</big-structured-text>
</root>
So far I've tried:
<xsl:for-each
select="//big-structured-text">
<text>
<xsl:value-of select=".//*[not(*)
and not(ancestor-or-self::code)
and not(ancestor-or-self::subtitle)
and not(ancestor-or-self::title)
]" />
</text>
</xsl:for-each>
but this does just take the node that don't have any children, it will take keyword
but not the text following the introduction title
I've also tried:
<xsl:for-each
select="//big-structured-text">
<text>
<xsl:value-of select=".//*[
not(ancestor-or-self::code)
and not(ancestor-or-self::subtitle)
and not(ancestor-or-self::title)
]" />
</text>
</xsl:for-each>
But this is echoing multiple time the interesting text and sometime the uninteresting one (every node is iterate once for itself and then one time per ancestor).
Upvotes: 1
Views: 198
Reputation: 122364
Rather than for-each you could approach this using templates. The default behaviour when you apply-templates to an element node is simply to recursively apply them to all its child nodes (which includes text nodes as well as other elements), and for a text node to output the text. Therefore all you need to do is create empty templates to squash the elements you don't want and then let the default templates do the rest.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<root>
<xsl:apply-templates select="/root/big-structured-text" />
</root>
</xsl:template>
<xsl:template match="big-structured-text">
<text><xsl:apply-templates /></text>
</xsl:template>
<!-- empty template means anything inside any of these elements will be
ignored -->
<xsl:template match="title | subtitle | code" />
</xsl:stylesheet>
When run on your sample input this produces
<?xml version="1.0"?>
<root><text>
In this part we describe Australian foreign policy....
After its independence...
foreign policy
australia
</text><text>
</text></root>
You may wish to investigate the use of <xsl:strip-space>
to get rid of some of the extraneous whitespace, but with mixed content you always have to be careful not to strip out too much.
Upvotes: 2