Exclude nodes between elements with an extra twist

Question

Have a tricky XPath issue that I can't quite seem to get. Let's say I have the following:


  
      
        
          Some text
        
      
      
        
          
        
      
      
        
          Some more text
         
      
      
        
          
        
        
      
        
          
content
xyz

I want to select all nodes above the element which has a p tag inside the xhtml with a "finalBlock" class, except for the ones that do not have context (node text - e.g. block id 789). However, this rule should only apply until the first node with content is encountered again - afterwards the empty elements should all be included. This means that the input above should produce the following output:


  
      
        
          Some text
        
      
      
        
          
        
      
      
        
          Some more text
         
        
      
        
          
content
xyz

Where the element with an id of 789 was removed, but all others were kept. I've managed to craft the XPath query that excludes the block elements I want (empty ones), but am struggling with implementing the "between" rule. Any thoughts would be greatly appreciated!

Here's the expression excluding the empty block elements

//block[html/p]/html/p[normalize-space(.) != '']

helderdarocha · Accepted Answer

This expression selects "the element which has a p tag inside the html, with a finalBlock class", which is :

//*[html/p[@class='finalBlock']]

This one selects all the block nodes that precede it ("all nodes above" - which does not include the ancestor nodes):

//*[html/p[@class='finalBlock']]/preceding-sibling::*

You can add a predicate to restrict that to only the ones that have a non-empty p descendant:

//*[html/p[@class='finalBlock']]/preceding-sibling::*[descendant::p[string()]]

And the ones that have an empty p descendant, except the most recent one:

//*[html/p[@class='finalBlock']]/preceding-sibling::*[descendant::p[not(string())]][not(position() = 1)]

If you perform a union of the previous two expressions, you will obtain all the block nodes that satisfy the requirements you stated:

//*[html/p[@class='finalBlock']]/preceding-sibling::*[descendant::p[string()]] 
| //*[html/p[@class='finalBlock']]/preceding-sibling::*[descendant::p[not(string())]][not(position() = 1)]

Exclude nodes between elements with an extra twist

Answers (1)

Related Questions