Macin
Macin

Reputation: 391

XPath sorting not persistent?

I have a following XML:

<doc>
<ActivityNarrativeInformation>
  <ActivityID>123456789</ActivityID>
  <ActivityNarrativeInformationID>111111111</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>1</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>She Sells Sea Shells by the Sea Shore and she also</ActivityNarrativeText>
  </ActivityNarrativeInformation>
 <ActivityNarrativeInformation>
  <ActivityID>123456789</ActivityID>
  <ActivityNarrativeInformationID>111111111</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>3</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>triple shot frappuccino, extra hot, with whipped cream in a tall cup </ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>123456789</ActivityID>
  <ActivityNarrativeInformationID>111111111</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>2</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>likes to take long walks on the beach while she drinks a</ActivityNarrativeText>
  </ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>987654321</ActivityID>
  <ActivityNarrativeInformationID>222222222</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>486</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>It was a dark and stormy night; the rain fell in torrents--except at occasional intervals, when
 </ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>987654321</ActivityID>
  <ActivityNarrativeInformationID>222222222</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>488</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>scene lies), rattling along the housetops, and fiercely agitating the scanty flame of the lamps that
</ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>987654321</ActivityID>
  <ActivityNarrativeInformationID>222222222</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>487</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>was checked by a violent gust of wind which swept up the streets (for it is in London that our
</ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>987654321</ActivityID>
  <ActivityNarrativeInformationID>222222222</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>489</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>struggled against the darkness.
</ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>55555555</ActivityID>
  <ActivityNarrativeInformationID>77777777</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>31921</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>Papa Bear was very big and growly. Mamma Bear was middle-sized and pleasant.
</ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>55555555</ActivityID>
  <ActivityNarrativeInformationID>77777777</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>31923</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>Papa bear loved to fix things around the house; Mama bear loved to grow flowers in her garden; and, Baby bear loved playing in the yard. They were very happy. </ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>55555555</ActivityID>
  <ActivityNarrativeInformationID>77777777</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>31920</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>Once upon a time there were three bears, Papa Bear, Mamma Bear and Baby Bear
</ActivityNarrativeText>
</ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>55555555</ActivityID>
  <ActivityNarrativeInformationID>77777777</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>31922</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>And Baby Bear, well, he was small, and
sometimes he squeaked! They lived in a pretty little house on the edge of the forest
</ActivityNarrativeText>
</ActivityNarrativeInformation>
</doc

I need to group ActivityNarrativeInformation elements by ActivityID and concatenate ActivityNarrativeText in such a way, that it is sorted by ActivityNarrativeSequenceNumber

I managed to sort elements with following XPath query (XPath 3.1) sort(//ActivityNarrativeInformation[ActivityID=123456789], (), function($ActivityNarrativeSequenceNumber) {$ActivityNarrativeSequenceNumber})

So the result looks like this:

<ActivityNarrativeInformation>
  <ActivityID>123456789</ActivityID>
  <ActivityNarrativeInformationID>111111111</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>1</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>She Sells Sea Shells by the Sea Shore and she also</ActivityNarrativeText>
  </ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>123456789</ActivityID>
  <ActivityNarrativeInformationID>111111111</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>2</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>likes to take long walks on the beach while she drinks a</ActivityNarrativeText>
  </ActivityNarrativeInformation>
<ActivityNarrativeInformation>
  <ActivityID>123456789</ActivityID>
  <ActivityNarrativeInformationID>111111111</ActivityNarrativeInformationID>
  <ActivityNarrativeSequenceNumber>3</ActivityNarrativeSequenceNumber>
  <ActivityNarrativeText>triple shot frappuccino, extra hot, with whipped cream in a tall cup </ActivityNarrativeText>
</ActivityNarrativeInformation>

The probelm however is, that if I want to limit down above to just all ActivityNarrativeText by adding /ActivityNarrativeText at the end like this

sort(//ActivityNarrativeInformation[ActivityID=123456789], (), function($ActivityNarrativeSequenceNumber) {$ActivityNarrativeSequenceNumber})/ActivityNarrativeText

or

sort(//ActivityNarrativeInformation[ActivityID=123456789]/ActivityNarrativeText, (), function($seq) {$seq/ActivityNarrativeSequenceNumber})

The order is lost:

<ActivityNarrativeText>She Sells Sea Shells by the Sea Shore and she also</ActivityNarrativeText>
<ActivityNarrativeText>triple shot frappuccino, extra hot, with whipped cream in a tall cup </ActivityNarrativeText>
<ActivityNarrativeText>likes to take long walks on the beach while she drinks a</ActivityNarrativeText>

What am I doing wrong?

Upvotes: 0

Views: 106

Answers (4)

Reino
Reino

Reputation: 3423

Testing it here: videlibri.de/cgi-bin/xidelcgi

If you're using , then please add its tag. And maybe for Windows, or for Unix as well.

I'm not too sure this can be done with XPath. I believe you're better off using XQuery.

For the narrative with <ActivityID>123456789</ActivityID> you could do:

$ xidel -s input.xml --xquery '
  normalize-space(
    for $x in //ActivityNarrativeInformation
    where $x/ActivityID = 123456789
    order by $x/ActivityNarrativeSequenceNumber
    return
    $x/ActivityNarrativeText
  )
'

For all narratives I'd suggest:

$ xidel -s input.xml --xquery '
  for $narrative at $i in //ActivityNarrativeInformation
  group by $id:=$narrative/ActivityID
  count $i
  return (
    $i,
    normalize-space(
      for $seq in $narrative
      order by $seq/ActivityNarrativeSequenceNumber
      return
      $seq/ActivityNarrativeText
    )
  )
'
1
Once upon a time there were three bears, [...]
2
She Sells Sea Shells by the Sea Shore and [...]
3
It was a dark and stormy night; the rain [...]

Group by <ActivityID> first, then in another for-loop order the sentences by <ActivityNarrativeSequenceNumber>.

Update 2021-07-05; I forgot about XPath's !. In that case one for-loop is enough:

$ xidel -s input.xml --xquery '
  for $narrative at $i in //ActivityNarrativeInformation
  order by $narrative/ActivityNarrativeSequenceNumber
  group by $id:=$narrative/ActivityID
  count $i
  return (
    $i,
    normalize-space($narrative ! ActivityNarrativeText)
  )
'

Upvotes: 1

Martin Honnen
Martin Honnen

Reputation: 167581

In addition to the right answer not to use / after sorting but ! instead, one of your attempts would actually work if your sort function argument selected the right element as the sort key:

sort(//ActivityNarrativeInformation[ActivityID=123456789]/ActivityNarrativeText, (), function($text) {$text/../ActivityNarrativeSequenceNumber})

Upvotes: 1

BeniBela
BeniBela

Reputation: 16917

You lose the order when you write /ActivityNarrativeText, and it returns the <ActivityNarrativeText> in the same order they have in the input file

/something with nodes does not just mean map it to the child.

It means

  • Map it

  • Reorder all nodes to the input document order

  • Remove duplicates

You could use !ActivityNarrativeText

Upvotes: 2

Jack Fleeting
Jack Fleeting

Reputation: 24930

If what you want to do is extract a coherenet sentece from your sample xml from that particular ActivityID, this expression

string-join(sort(//ActivityNarrativeInformation[ActivityID=123456789]/ActivityNarrativeText/concat(normalize-space()," "), (), function($ActivityNarrativeSequenceNumber) {$ActivityNarrativeSequenceNumber}))

should output

She Sells Sea Shells by the Sea Shore and she also likes to take long walks on the beach while she drinks a triple shot frappuccino, extra hot, with whipped cream in a tall cup 

Upvotes: 1

Related Questions