Reputation: 61
Have a tricky XPath issue that I can't quite seem to get. Let's say I have the following:
<content>
<body>
<block id="123">
<html>
<p align="left">Some text</p>
</html>
</block>
<block id="abc8383">
<html>
<p></p>
</html>
</block>
<block id="456">
<html>
<p><span>Some more text</span></p>
</html>
</block>
<block id="789">
<html>
<p></p>
</html>
</block>
<block id="012356">
<html>
<p class="finalBlock"><h3>content</h3><span>xyz</span></p>
</html>
</block>
</body>
</content>
I want to select all nodes above the element which has a p tag inside the xhtml with a "finalBlock" class, except for the ones that do not have context (node text - e.g. block id 789). However, this rule should only apply until the first node with content is encountered again - afterwards the empty elements should all be included. This means that the input above should produce the following output:
<content>
<body>
<block id="123">
<html>
<p align="left">Some text</p>
</html>
</block>
<block id="abc8383">
<html>
<p></p>
</html>
</block>
<block id="456">
<html>
<p><span>Some more text</span></p>
</html>
</block>
<block id="012356">
<html>
<p class="finalBlock"><h3>content</h3><span>xyz</span></p>
</html>
</block>
</body>
</content>
Where the element with an id of 789 was removed, but all others were kept. I've managed to craft the XPath query that excludes the block elements I want (empty ones), but am struggling with implementing the "between" rule. Any thoughts would be greatly appreciated!
Here's the expression excluding the empty block elements
//block[html/p]/html/p[normalize-space(.) != '']
Upvotes: 2
Views: 105
Reputation: 23637
This expression selects "the element which has a p
tag inside the html
, with a finalBlock
class", which is <block id="012356">
:
//*[html/p[@class='finalBlock']]
This one selects all the block
nodes that precede it ("all nodes above" - which does not include the ancestor nodes):
//*[html/p[@class='finalBlock']]/preceding-sibling::*
You can add a predicate to restrict that to only the ones that have a non-empty p
descendant:
//*[html/p[@class='finalBlock']]/preceding-sibling::*[descendant::p[string()]]
And the ones that have an empty p
descendant, except the most recent one:
//*[html/p[@class='finalBlock']]/preceding-sibling::*[descendant::p[not(string())]][not(position() = 1)]
If you perform a union of the previous two expressions, you will obtain all the block
nodes that satisfy the requirements you stated:
//*[html/p[@class='finalBlock']]/preceding-sibling::*[descendant::p[string()]]
| //*[html/p[@class='finalBlock']]/preceding-sibling::*[descendant::p[not(string())]][not(position() = 1)]
Upvotes: 1