Extract text through XSL by skipping content within given children

Question

I'm trying to extract the text of an interesting node (here big-structured-text) but within this node there are some children I would like to skip (here title, subtitle, and code). Those "to remove" nodes can have children.

Sample data:


    
        
            Introduction
            In this part we describe Australian foreign policy....
            
                Historical context
                After its independence...
                
                    foreign policy
                    australia
                    
                        XXHY-123
                        IRRN

So far I've tried:

but this does just take the node that don't have any children, it will take keyword but not the text following the introduction title

I've also tried:

But this is echoing multiple time the interesting text and sometime the uninteresting one (every node is iterate once for itself and then one time per ancestor).

Ian Roberts · Accepted Answer

Rather than for-each you could approach this using templates. The default behaviour when you apply-templates to an element node is simply to recursively apply them to all its child nodes (which includes text nodes as well as other elements), and for a text node to output the text. Therefore all you need to do is create empty templates to squash the elements you don't want and then let the default templates do the rest.

When run on your sample input this produces





            In this part we describe Australian foreign policy....


                After its independence...

                    foreign policy
                    australia

You may wish to investigate the use of to get rid of some of the extraneous whitespace, but with mixed content you always have to be careful not to strip out too much.

Extract text through XSL by skipping content within given children

Answers (1)

Related Questions