Pagli
Pagli

Reputation: 31

XPath with recursive definitions

I have a DTD like this :

     <!ELEMENT Root (Thread*) >
     <!ELEMENT Thread(ThreadId, Message) >
    <!ELEMENT Replies(message+) >
     <!ELEMENT message(timestamp, sender, recipient, subject, text, Replies?)>

So a thread will have a message and this message can have a node 'replies', then this node can contain messages and so on until the bottom of the structure.

Now what I want to do is to first retrieve the ID of the thread with the most messages and then retrieve the ID of the thread with the longest chain of nested replies.

It feels like a recursive problem but I'm not able to approach it in XPath. So far I tried something like this :

      For $thread in //thread
      Count(descendant-or-self::$thread/message) 

For each thread I Try to count the number of children messages nodes, but this solution counts the number of All the children nodes of the thread, therefore including Replies nodes.

I'm feeling lost with this kind of problems as I cannot figure out what to do in these 'recursive situations'.

Upvotes: 3

Views: 360

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167716

Assuming XPath 3.0 you can use e.g.

let $max := max(/Root/Thread/count(.//Message))
return /Root/Thread[count(.//Message) eq $max]/ThreadId

to find the id(s) of the thread(s) with most messages and I think

let $max := max(/Root/Thread/Message//Replies[not(Message/Replies)]/count(ancestor::Replies))
return /Root/Thread[Message//Replies[not(Message/Replies)]/count(ancestor::Replies) = $max]/ThreadId

to find the id(s) of the thread(s) with the longest chain of nested replies.

With XPath 2.0 you don't have let expressions so you would have to inline the code bound in my samples to the variable in the place where the variable is referenced.

In XPath 3.1 you have a sort function (https://www.w3.org/TR/xpath-functions-31/#func-sort) so instead of computing the maximum and selecting the items with the maximum you could sort and take the last e.g.

sort(/Root/Thread, (), function($t) { max($t/Message//Replies[not(Message/Replies)]/count(ancestor::Replies)) })[last()]/ThreadId

for the second, more complex query or

sort(/Root/Thread, (), function($t) { count($t//Message) })[last()]/ThreadId

for the first one.

Upvotes: 2

Related Questions