Reputation: 75850
I'm trying to come up with a clean Xpath 1.0 expression to return all nodes in between specific nodes. The requirements are:
To test what I'm doing you could use:
<t>
<s>ABC</s>
<s>123</s>
<s>DEF</s>
<s>456</s>
<s>GHI</s>
<s>JKL</s>
<s>(M)</s>
<s>NOP</s>
<s>(Q)</s>
</t>
To only end up with both 'GHI' and 'JKL' I came up with:
//s[position()<count(//s[starts-with(., '(')][1]/preceding::*)+1][position()>count(//s[.*0=0][last()]/preceding::*)+1]
It works just fine, however I can't help but feeling this can be done much smoother. To do so I though there should be a way to use the retrieved nodes in prior Xpath, e.g:
[.*0=0][last()]/following::s
.[starts-with(., '(')][1]
.//s[.*0=0][last()]/following::s[position()<[starts-with(.,'(')][1]]
which obviously is incorrect.Any ideas, or am I stuck with what I got in the first place? I'm using this in the Excel function FILTERXML()
.
Upvotes: 0
Views: 94
Reputation: 24930
It's very difficult to do this with xpath 1.0 (though it's very easy with xpath 2.0), but since you are using Excel and FILTERXML()
, the following is a possible workaround:
Assuming your xml is in A1, then in B1 user FILTERXML()
for the first set:
=FILTERXML(A1,"//s[translate(., translate(.,'0123456789',''), '')][last()]//following::s")
Credit here goes to the "double translate" method.
And again in the C1:
=FILTERXML(A1,"//s[starts-with(., '(')][1]/preceding::s")
Then it's back to Excel. Either:
=FILTER(B:B,COUNTIF(C:C,B:B))
or
=IF(ISERROR(MATCH(B:B,C:C,0)),"",B:B)
This should output 'GHI' and 'JKL'.
Per OP's comment below, there is also a way to do that with one xpath 1.0 expression using the Kayessian method, which in this case would take the form of:
//s[translate(., translate(.,'0123456789',''), '')][last()]//following::s[count(. | //s[starts-with(., '(')][1]/preceding::s) = count(//s[starts-with(., '(')][1]/preceding::s)]
and, strangely enough FILTERXML()
understands it!
Upvotes: 1