Reputation: 815
Environment: eXist-DB 4.4 / Xquery 3.1
I have hundreds of tei:xml documents in which are encoded named entities persName
and placeName
. The documents are in
collection("db/fooapp/data")
Each instance of persName
and placeName
has an attribute @nymRef
which contains a single value that refers to an xml:id
in a master documents:
db/fooapp/data/codes_persons.xml
db/fooapp/data/codes_places.xml
These master documents contain, among other things, the canonical name of each person or place.
I am frequently doing single lookups for a certain single name, for example
let $x := some @nymRef
let $y := doc(db/fooapp/data/codes_places.xml)//tei:place[@xml:id=$x]//tei:placeName/text()
return $y
But, there are times where I need to do this, cycling through huge lists. For example, across all the documents I need to output an id
for a seg
and it has a (or multiple) child element placeName/@nymRef
:
<seg xml:id="fooref">some text<placeName nymRef="fooplace"/>some text</seg>
The task is to obtain all the seg/@xml:id
and then lookup and output the canonical name of any placeName/@nymRef
underneath it. This results in numerous round trips that are really inefficient, but I do not know any other means to do this in eXist-DB. The costly roundtrip is expressed at let $c
, cycling through return
:
let $coll := collection("db/fooapp/data")
for $a in $coll//seg
for $b in $a//placeName
let $c := $doc("db/fooapp/data/codes_places.xml")//tei:place[@xml:id=$b/data(@nymRef)]//tei:placeName/text()
return
<tr>
<td>{$a/@xml:id}</td>
<td>{$c}</td>
</tr>
This can add up to hundreds of round trips for a single table output.
I have no objections to restructuring the task into multiple functions if necessary.
Many thanks in advance.
Upvotes: 0
Views: 76
Reputation: 733
Please provide us with an input xml and the desired output, otherwise there is no way to rewrite your query. We also need to see your index configuration.
Some general advice, for avoiding roundtrips:
First off, see my previous answer to your question on the use of
ft:query()
. When doing [@xml:id=$b/data(@nymRef)]
is exist using
indexes or are you forcing it to do a string comparison without
having an index configured on that string?
id()
is the fastest way possible to lookup xml:id
values
distinct-values
is your friend to only look-up each distinct
key:value pair once.
Use a single for loop to avoid iterating over the same data multiple times.
Whenever possible go for more restrictive XPath expressions, //
probably loads a lot of unnecessary data into memory.
All of these and more can be found in the documentation
Upvotes: 1