Reputation: 258
Example of a markup:
<div class="post-content">
<p>
<moredepth>
<...>
<span class="image-container float_right">
<div class="some_element">
image1
</div>
<p>do not need this</p>
</span>
<div class="image-container float_right">
image2
</div>
<p>text1</p>
<li>text2</li>
</...>
</moredepth>
</p>
</div>
Worst part is that depth of "image-container" can be on any level.
Xpath I try to use:
//div[contains(@class, 'post-content')]//*[not(contains(@class, 'image-container'))]
What Xpath should I use to be able to exclude "some_element" and any other children of "image-container" of any depth and an "image-container" element itself?
Output in this example should be:
<p>
<moredepth>
<...>
<p>text1</p>
<li>text2</li>
</...>
</moredepth>
</p>
P.S. Is it possible to make such a selection using CSS?
Upvotes: 6
Views: 4255
Reputation: 22617
XPath does not allow manipulating a fragment of XML once it is returned to you by a path expression. So, you cannot select moredepth
:
//moredepth
without getting as a result all of this element node, including all descendant nodes that you'd like to exclude:
<moredepth>
<span class="image-container float_right">
<div class="some_element">
image1
</div>
<p>do not need this</p>
</span>
<div class="image-container float_right">
image2
</div>
<p>text1</p>
<li>text2</li>
</moredepth>
What you can do is only select the child nodes of moredepth
:
//div[contains(@class, 'post-content')]/p/moredepth/*[not(contains(@class,'image-container'))]
which will yield (individual results separated by -------
):
<p>text1</p>
-----------------------
<li>text2</li>
Upvotes: 3
Reputation: 23637
You can apply the Kaysian method for obtaining the intersection of a set. You have two sets:
A: The elements which descend from //div[contains(@class, 'post-content')]
, excluding the current element (since you don't want the root div
):
//*[ancestor::div[contains(@class, 'post-content')]]
B: The elements which descend from //*[not(contains(@class, 'image-container'))]
, including the current element (since you want to exclude the entire tree, including the div
and span
):
//*[not(ancestor-or-self::*[contains(@class, 'image-container')])]
The intersection of those two sets is the solution to your problem. The formula of the Kaysian method is: A [ count(. | B) = count(B) ]
. Applying that to your problem, the result you need is:
//*[ancestor::div[contains(@class, 'post-content')]]
[ count(. | //*[not(ancestor-or-self::*[contains(@class, 'image-container')])])
=
count(//*[not(ancestor-or-self::*[contains(@class, 'image-container')])]) ]
This will select the following elements from your example code:
/div/p
/div/p/moredepth
/div/p/moredepth/...
/div/p/moredepth/.../p
/div/p/moredepth/.../li
excluding the span
and the div
that match the unwanted class, and its descendants.
You can then add extra steps to the expression to filter out exactly which text or nodes you need.
Upvotes: 5