JvdV
JvdV

Reputation: 75850

Extract nodes in between specific requirements

I'm trying to come up with a clean Xpath 1.0 expression to return all nodes in between specific nodes. The requirements are:

To test what I'm doing you could use:

<t>
    <s>ABC</s>
    <s>123</s>
    <s>DEF</s>
    <s>456</s>
    <s>GHI</s>
    <s>JKL</s>
    <s>(M)</s>
    <s>NOP</s>
    <s>(Q)</s>
</t>

To only end up with both 'GHI' and 'JKL' I came up with:

//s[position()<count(//s[starts-with(., '(')][1]/preceding::*)+1][position()>count(//s[.*0=0][last()]/preceding::*)+1]

It works just fine, however I can't help but feeling this can be done much smoother. To do so I though there should be a way to use the retrieved nodes in prior Xpath, e.g:

Any ideas, or am I stuck with what I got in the first place? I'm using this in the Excel function FILTERXML().

Upvotes: 0

Views: 94

Answers (1)

Jack Fleeting
Jack Fleeting

Reputation: 24930

It's very difficult to do this with xpath 1.0 (though it's very easy with xpath 2.0), but since you are using Excel and FILTERXML(), the following is a possible workaround:

Assuming your xml is in A1, then in B1 user FILTERXML() for the first set:

=FILTERXML(A1,"//s[translate(., translate(.,'0123456789',''), '')][last()]//following::s")

Credit here goes to the "double translate" method.

And again in the C1:

=FILTERXML(A1,"//s[starts-with(., '(')][1]/preceding::s")

Then it's back to Excel. Either:

=FILTER(B:B,COUNTIF(C:C,B:B))

or

=IF(ISERROR(MATCH(B:B,C:C,0)),"",B:B)

This should output 'GHI' and 'JKL'.

Per OP's comment below, there is also a way to do that with one xpath 1.0 expression using the Kayessian method, which in this case would take the form of:

//s[translate(., translate(.,'0123456789',''), '')][last()]//following::s[count(. | //s[starts-with(., '(')][1]/preceding::s) = count(//s[starts-with(., '(')][1]/preceding::s)]

and, strangely enough FILTERXML() understands it!

Upvotes: 1

Related Questions