Reputation: 815
In my database (Xquery 3.1, eXist-db 4.7) I have 12,000+ TEI XML documents (examples
), each of which can have multiple references to a single stand-alone TEI document of keywords
.
Each of these 12,000 example
documents like the following, with variable number of keyword
references:
<TEI type="example" group="X">
<teiHeader>some content</teiHeader>
<text>
<front>
<div type="keywords">
<list type="keywords">
<item type="keyword" corresp="KW0002"/>
<item type="keyword" corresp="KW0034"/>
<item type="keyword" corresp="KW0349"/>
<item type="keyword" corresp="KW0670"/>
<item type="keyword" corresp="KW1987"/>
</list>
</div>
</front>
</text>
</TEI>
The keyword
document contains 2000+ xml:id
references, each containing 5 language translations:
<category xml:id="KW0001">
<desc xml:lang="de">geliebter</desc>
<desc xml:lang="en">lover</desc>
<desc xml:lang="es">amante</desc>
<desc xml:lang="fr">amant</desc>
<desc xml:lang="it">amante</desc>
</category>
<category xml:id="KW0002">
<desc xml:lang="de">bischof</desc>
<desc xml:lang="en">bishop</desc>
<desc xml:lang="es">obispo</desc>
<desc xml:lang="fr">évêque</desc>
<desc xml:lang="it">vescovo</desc>
</category>
The objective of my query is to get all keywords
in a selection (@group
) of examples
, then group them and count them for HTML.
My current solution takes a long time, despite having indexed all the elements and attributes. I suspect there is a more efficient way for putting this together, but I can't see it.
let $cols := collection($mydatabase)//TEI[@group="X"]
let $kwdoc := doc("keywords.xml")//category
let $kws := distinct-values($cols//item[@type="keyword"]/data(@corresp))
let $lis := for $kw in $kws
let $count := count($cols//item[@type="keyword" and @corresp=$kw])
order by $count descending
return
<li>
<a href="{concat("www.example.com/keywords/",$kw)}">
{for $x in $kwdoc[@xml:id=$kw]/tei:desc
return <span class="{@xml:lang}">{$x/text()}</span>}
({$count})
</a>
</li>
return <ul>{$lis}</ul>
This produces HTML items that look like this:
<ul>
<li>
<a href="www.example.com/keywords/KW0001">
<span class="de">geliebter</span>
<span class="en">lover</span>
<span class="es">amante</span>
<span class="fr">amant</span>
<span class="it">amante</span>
</a>
(64)
</li>
<li>
<a href="www.example.com/keywords/KW0002">
<span class="de">bischof</span>
<span class="en">bishop</span>
<span class="es">obispo</span>
<span class="fr">évêque</span>
<span class="it">vescovo</span>
</a>
(64)
</li>
</ul>
Many thanks in advance.
Upvotes: 0
Views: 116
Reputation: 167726
I think in XQuery 3 you should do that grouping with group by
, hopefully that also performs better:
let $cols := collection($mydatabase)//TEI[@group="X"]
let $kwdoc := doc("keywords.xml")//category
let $lis :=
for $group in $cols//item[@type = "keyword"]
group by $keyword := $group/@corresp
order by count($group) descending
return
<li>
<a href="{concat("www.example.com/keywords/",$keyword )}">
{for $desc in id($keyword, $kwdoc)/desc
return <span class="{$desc/@xml:lang}">{$desc/text()}</span>}
({count($group)})
</a>
</li>
return <ul>{$lis}</ul>
The only issue I haven't quite understood is whether the TEI documents in $cols
can reference keywords that are not in the keyword document, with the code I have shown above that check is not made.
Upvotes: 1