Reputation: 85
I would count all distinct items considering parent and child tag.
For example, if the document contains:
<root>
<A value="5.1">
<B value="3.5">
<C value="1.4">
<D value="0.2">
<E value="X"/>
</D>
</C>
</B>
</A>
<A value="5.1">
<B value="3.5">
<C value="1.4">
<D value="0.4">
<E value="Y"/>
</D>
</C>
</B>
</A>
<A value="4.6">
<B value="3.1">
<C value="1.5">
<D value="0.2">
<E value="X"/>
</D>
</C>
</B>
</A>
<A value="5.0">
<B value="3.6">
<C value="1.4">
<D value="0.2">
<E value="X"/>
</D>
</C>
</B>
</A>
</root>
I would count all distinct items for AB, ABC, ABCD, BCD etc.
I tried this xquery command, but got only the distinct values for individual items:
count(distinct-values(doc('partitioncollection')/root/A/@value)) -> 3
count(distinct-values(doc('partitioncollection')/root/B/@value)) -> 3
count(distinct-values(doc('partitioncollection')/root/C/@value)) -> 2
More specifically, if I want calculate the counts of all possibile pairs of value, the results should show:
A/B : 3
A/B/C : 3
A/B/C/D : 4
A/B/C/D/E : 4
B/C : 3
B/C/D : 4
B/C/D/E : 4
C/D : 3
C/D/E : 3
D/E : 2
C/E : should be 3 because there are 3 distinct pairs of values:
<C value="1.4"><E value="X"/>
<C value="1.4"><E value="Y"/>
<C value="1.5"><E value="X"/>
A/D : should be 4 because there are 4 distinct pairs of values:
<A value="5.1"><D value="0.2">
<A value="5.1"><D value="0.4">
<A value="4.6"><D value="0.2">
<A value="5.0"><D value="0.2">
etc.
Being computationally complex, I believe it is easier to create a function that takes as a single set of tags (i.e. A-D or C-E etc.) and returns the value of the counter
Upvotes: 0
Views: 409
Reputation: 167716
I am still not sure the problem is precisely described (it is not clear whether the structure is always A/B/C/D/E
, it is not clear whether for deeper levels you only want to compare e.g. B/C
if they are children of an A
with the same value) but in general I think this can be solve using grouping and recursion; I came up with
declare function local:distinct-descendants($elements as element()*) as xs:string*
{
for $element-group in $elements[*]
group by $element-name := node-name($element-group)
let $max-depth := max($element-group/count(.//*)) + 1
for $level in 2 to $max-depth
let $groups :=
for $path-group in $element-group
group by $group-path := string-join(subsequence($path-group/descendant-or-self::*/node-name(), 1, $level), '/')
for $value-group in $path-group
group by $value-seq := string-join(subsequence($value-group/descendant-or-self::*/@value, 1, $level), '|')
return head($group-path)
for $level-group in $groups
group by $level-path := $level-group
order by head($level)
return $level-path || ' : ' || count($level-group)
,
if ($elements/*) then local:distinct-descendants($elements/*) else ()
};
local:distinct-descendants(root/*)
At https://xqueryfiddle.liberty-development.net/nbUY4kz/4 this outputs
A/B : 3
A/B/C : 3
A/B/C/D : 4
A/B/C/D/E : 4
B/C : 3
B/C/D : 4
B/C/D/E : 4
C/D : 3
C/D/E : 3
D/E : 2
I get the same result with BaseX.
The code might be a bit too complicated if the structure is always A/B/C/D/E
, in that case the first grouping on the node-name()
is not needed.
I also had to group twice inside, on the "path" sequence (e.g. A/B/C
) and on the @value
sequence, there might be easier ways but I wasn't able to relate unique @value
sequences with the the nesting structure without doing the duplicate grouping.
Perhaps
declare function local:distinct-descendants($elements as element()*) as xs:string*
{
let $max-depth := max($elements/count(descendant-or-self::*))
for $level in 2 to $max-depth
let $groups :=
for $path-group in $elements
group by
$group-path := string-join(subsequence($path-group/descendant-or-self::*/node-name(), 1, $level), '/'),
$value-seq := string-join(subsequence($path-group/descendant-or-self::*/@value, 1, $level), '|')
return head($group-path)
for $level-group in $groups
group by $level-path := $level-group
order by head($level)
return $level-path || ' : ' || count($level-group)
,
if ($elements/*) then local:distinct-descendants($elements/*) else ()
};
local:distinct-descendants(root/*)
at https://xqueryfiddle.liberty-development.net/nbUY4kz/6 is a bit simpler and still does the job.
I think the previous two attempts work fine as long as the subtrees all have one child but break if not; thus in the general case of an arbitrary number of child elements it seems necessary to compute paths and values from descendants and group them:
declare variable $value-separator as xs:string external := '|';
declare function local:distinct-descendants($elements as element()*) as xs:string*
{
for $element-group in $elements[*]
group by $element-name := node-name($element-group)
return
(
let $groups :=
for $descendant-group in $element-group//*
group by
$path-key := string-join(($descendant-group/ancestor-or-self::* except $element-group/ancestor::*)/node-name(), '/'),
$value-key := string-join(($descendant-group/ancestor-or-self::* except $element-group/ancestor::*)/@value, $value-separator)
return $path-key
for $path-group in $groups
group by $path-key := $path-group
return $path-key || ' : ' || count($path-group)
,
if ($element-group/*) then local:distinct-descendants($element-group/*) else ()
)
};
local:distinct-descendants(root/*)
https://xqueryfiddle.liberty-development.net/nbUY4kz/14
Upvotes: 2