Iale
Iale

Reputation: 678

XPath - get parent of text nodes with condition

<doc ok="yes">
    <a>
        <b>
            <c>
                aa
                <d ok="yes">
                    bb
                </d>
                cc
            </c>
        </b>
    </a>
    <e>
        ee
    </e>
    <f ok="no">
        no
    </f>
</doc>

I need to retrieve list of nodes using XPath, where each node must satisfy these conditions:

  1. node has at least one child text node

  2. if the node (or closest node in ancestor axis) has an attribute "ok", the value must be "yes"

  3. when any ancestor is a part of the result, exclude node

So in my sample I would like to get <c> and <e>. Node <d> is excluded because it is a child of <c>, which is a part of the result.

I've started with condition (1) using this expression //*[count(./text()[normalize-space()])>0]. It returns <c>, <d>, <e> and <f>. I have no idea how to exclude <d>

Upvotes: 4

Views: 1547

Answers (1)

har07
har07

Reputation: 89325

I would devide this into 2 steps. First, consider only condition number 1 and 2.

//*[text()[normalize-space()]]
   [
      ancestor-or-self::*[not(@ok)] 
        or 
      ancestor-or-self::*[@ok][1][@ok='yes']
    ]

Given XML in question as input, above xpath return 3 elements : <c>, <d>, and <e>.

Next step would be implementing the condition number 3. That can be done by repeating the same predicate that was used in the first step, but now for ancestor::* instead of current node. Then negate the repeated predicate using not() as we want the ancestor to fail the condition no 1 & 2 (we want ancestor of current node not being part of the result already) :

[not(
        ancestor::*[text()[normalize-space()]]
        [
            ancestor-or-self::*[not(@ok)] 
                or 
            ancestor-or-self::*[@ok][1][@ok='yes']
        ]
    )
]

Combining both steps together you get the following xpath :

//*[text()[normalize-space()]]
   [
      ancestor-or-self::*[not(@ok)] 
        or 
      ancestor-or-self::*[@ok][1][@ok='yes']
    ]
    [not(
            ancestor::*[text()[normalize-space()]]
            [
                ancestor-or-self::*[not(@ok)] 
                    or 
                ancestor-or-self::*[@ok][1][@ok='yes']
            ]
        )
    ]

Each of the outer predicate ([]) in the final xpath, in order, represents condition no 1, 2, and 3.

Upvotes: 9

Related Questions