Reputation: 13
I am trying to remove duplicates on a lower level under my elements, as they can not be processed in the system. Unfortunately without much success so far.
The XML has several <Article>
childs under <Articles>
. The <Article>
Elements can have <UNIT>
Elements. These need to be unique in the whole document, but only the <NR>/<COUNT>
combination.
With the Example as followed:
<Articles>
<Article>
<A1>123</A1>
<A2>456</A2>
<UNIT>
<NR>59</NR>
<COUNT>3</COUNT>
<TEXT>RANDOM Aqfwfqf</TEXT>
</UNIT>
<UNIT>
<NR>59</NR>
<COUNT>3</COUNT>
<TEXT>RANDOM hrthe</TEXT>
</UNIT>
<UNIT>
<NR>59</NR>
<COUNT>59</COUNT>
<TEXT>RANDOM cutrh</TEXT>
</UNIT>
</Article>
<Article>
<A1>351</A1>
<A2>362</A2>
<UNIT>
<NR>59</NR>
<COUNT>4</COUNT>
<TEXT>RANDOM rtjrtf</TEXT>
</UNIT>
<UNIT>
<NR>59</NR>
<COUNT>3</COUNT>
<TEXT>RANDOM jrtj</TEXT>
</UNIT>
<UNIT>
<NR>59</NR>
<COUNT>59</COUNT>
<TEXT>RANDOM rtjrt</TEXT>
</UNIT>
</Article>
</Articles>
The result should look like:
<Articles>
<Article>
<A1>123</A1>
<A2>456</A2>
<UNIT>
<NR>59</NR>
<COUNT>3</COUNT>
<TEXT>RANDOM Aqfwfqf</TEXT>
</UNIT>
<UNIT>
<NR>59</NR>
<COUNT>59</COUNT>
<TEXT>RANDOM cutrh</TEXT>
</UNIT>
</Article>
<Article>
<A1>351</A1>
<A2>362</A2>
<UNIT>
<NR>59</NR>
<COUNT>4</COUNT>
<TEXT>RANDOM rtjrtf</TEXT>
</UNIT>
</Article>
</Articles>
I tried string-join the two values in <UNIT>
and then delete the nodes, but ended up deleting all of the UNIT instead of leaving one.
Getting a distinct list and count the occurences worked, but i couldn't delete the excesss nodes.
How could i reduce the quantity of the node combination to one?
Upvotes: 0
Views: 73
Reputation: 167516
For me, the following works:
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method 'xml';
declare option output:indent 'yes';
declare context item := document {
<Articles>
<Article>
<A1>123</A1>
<A2>456</A2>
<UNIT>
<NR>59</NR>
<COUNT>3</COUNT>
<TEXT>RANDOM Aqfwfqf</TEXT>
</UNIT>
<UNIT>
<NR>59</NR>
<COUNT>3</COUNT>
<TEXT>RANDOM hrthe</TEXT>
</UNIT>
<UNIT>
<NR>59</NR>
<COUNT>59</COUNT>
<TEXT>RANDOM cutrh</TEXT>
</UNIT>
</Article>
<Article>
<A1>351</A1>
<A2>362</A2>
<UNIT>
<NR>59</NR>
<COUNT>4</COUNT>
<TEXT>RANDOM rtjrtf</TEXT>
</UNIT>
<UNIT>
<NR>59</NR>
<COUNT>3</COUNT>
<TEXT>RANDOM jrtj</TEXT>
</UNIT>
<UNIT>
<NR>59</NR>
<COUNT>59</COUNT>
<TEXT>RANDOM rtjrt</TEXT>
</UNIT>
</Article>
</Articles>
};
. transform with {
delete node for $unit in //UNIT
group by $nr := $unit/NR, $cnt := $unit/COUNT
return subsequence($unit, 2)
}
So this is doing it on an in memory context node, I think if you have a db document as the input doing
delete node for $unit in //UNIT
group by $nr := $unit/NR, $cnt := $unit/COUNT
return subsequence($unit, 2)
would work just fine.
Upvotes: 1