JvdV
JvdV

Reputation: 75850

Count unique child-nodes that hold specific substring

I've got a small query on Excel's FILTERXML function and would like to return all parent <t> nodes that got more than 2 unique child nodes that contain the text property of it's parent. To visualize this:

<x>
  <t>A
    <s>A|x</s>
    <s>A|y</s>
    <s>B|y</s>
    <s>B|z</s>
  </t>
  <t>B
    <s>B|x</s>
    <s>B|y</s>
    <s>B|Z</s>
    <s>A|x</s>
  </t>
  <t>C
    <s>C|x</s>
    <s>C|y</s>
    <s>C|x</s>
    <s>A|x</s>
  </t>
</x>

So what I would like to return here would be t-node B since that's the only one that has more than 2 unique childres that hold their parents text property B.

Therefor I came up with the following expression:

//t[count(.//*[contains(.,concat(../text(),'|'))])>2]

This works fine to return B but also return C since it doesn't yet account for unique values. Therefor I tried to extend this expression:

//t[count(.//*[contains(.,concat(../text(),'|'))][.//*[not(preceding::*=.)]])>2]

But however, now it returns no t-nodes. Where did I go wrong in my extended expression and how can I fix this to only return B in this case?

Upvotes: 1

Views: 150

Answers (1)

Jack Fleeting
Jack Fleeting

Reputation: 24930

Well, the wonders of xpath 1.0... It's doable, but is ugly looking:

//t[count(s[not(.= preceding-sibling::s/.)][contains(.,concat(../normalize-space(text()[1]),"|"))])>2]

Upvotes: 2

Related Questions