Reputation: 89
I have an XML file with a bunch of <entry>
elements in it (see below). I would like to extract most of the informations given in the <entry>
container and put them in a (X)HTML document.
I'm able to perform a search and get the wanted element contents. If I search for the term ἄγγελος
either in entry/hyperlemma/orth
(path A) or in cit/hyperlemma/orth
(path B), it is found once in entry01 in path A and twice in entry02 in path B.
The idea is that I print the content of each entry
container where ἄγγελος
was found, regardless of the amount of occurrences. As the term was found in entry02 twice, the entry gets (of course) printed twice, but I only need it once. Would that be possible to do with XQuery? And if so, how would I do that?
My XML:
<text>
<entry xml:id="01">
<hyperlemma>ἄγγελος</hyperlemma>
<lemma>ἄγγελος</lemma>
<variant>τῶν ἀγγέλων
<hyperlemma>
<orth>ἄγγελος</orth>
</hyperlemma>
</variant>
</entry>
<entry xml:id="02">
<hyperlemma>
<orth>ангелъ</orth>
</hyperlemma>
<lemma>
<orth>ангелъ</orth>
</lemma>
<variant>
<orth>анг꙯ла</orth>
<hyperlemma>
<orth>ангелъ</orth>
</hyperlemma>
<cit>
<hyperlemma>
<orth>ἄγγελος</orth>
</hyperlemma>
<lemma>
<orth>ἄγγελον</orth>
</lemma>
</cit>
</variant>
<variant>
<orth>анг꙯лъ</orth>
<hyperlemma>
<orth>ангелъ</orth>
</hyperlemma>
<cit>
<hyperlemma>
<orth>ἄγγελος</orth>
</hyperlemma>
<lemma>
<orth>ἄγγελος</orth>
</lemma>
</cit>
</variant>
</entry>
</text>
My XQuery:
xquery version "3.0";
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "xml";
declare variable $searchphrase := "ἄγγελος";
<html>
<head>
<meta HTTP-EQUIV="Content-Type" content="text/html; charset=UTF-8"/>
</head>
<body>
<h1>Output of searchterm</h1>
<p>You are looking for "<font color="red"><strong>{$searchphrase}</strong></font>"</p>
{
let $hyperlemmas := doc("sample_entry.xml")/(descendant::entry | descendant::cit)/hyperlemma/orth [contains(., $searchphrase)]
return
<p>{$searchphrase} was found {count($hyperlemmas)} times.</p>
}
{
let $hyperlemmas := doc("sample_entry.xml")/(descendant::entry | descendant::cit)/hyperlemma/orth [contains(., $searchphrase)]
for $hyperlemma in $hyperlemmas
let $entry_id := $hyperlemma/ancestor::entry/@xml:id
let $lemma := $hyperlemma/ancestor::entry/lemma/orth
let $variant := $hyperlemma/ancestor::entry/variant/orth
return
<div>
Entry {string($entry_id)}:<br/>
Lemma: {$lemma} //
{
for $form in $variant
return
<i>{$form}</i>
}
</div>
}
</body>
</html>
Upvotes: 1
Views: 269
Reputation: 89
I finally figured out myself how I can just print certain elements inside of the entry
tag when a given searchterm can be found at different postions. Here is the rewritten XQuery code that (for now) works for me and gives the intended results:
xquery version "3.0";
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "xml";
declare variable $searchphrase := "ἄγγελος";
<html>
<head>
<meta HTTP-EQUIV="Content-Type" content="text/html; charset=UTF-8"/>
</head>
<body>
<h1>Output of searchterm</h1>
<p>You are looking for "<font color="red"><strong>{$searchphrase}</strong></font>"</p>
{
let $hyperlemmas := doc("sample_entry.xml")/(descendant::entry | descendant::cit)/hyperlemma/orth [contains(., $searchphrase)]
let $ids := $hyperlemmas/ancestor::entry/@xml:id
return
<p>{$searchphrase} was found {count($hyperlemmas)} times. IDs: {data($ids)} </p>
}
{
let $entry_base := doc("sample_entry.xml")/text
for $entry in $entry_base/entry
let $id := $entry/@xml:id
let $variant := $entry/variant/orth
let $found_pos1 := $entry/hyperlemma/orth
let $found_pos2 := $entry/descendant::cit/hyperlemma/orth
where $found_pos1 = $searchphrase or $found_pos2 = $searchphrase
return
<div>ID {data($id)}:<br/>Lemma: {$entry/lemma/orth}<br/>
{
for $item in $variant
return
<div>Variant: {$item}
{
for $cit in $item/../cit/lemma
where exists($cit)
return
<i>-> {$cit}</i>
}
</div>
}
</div>
}
</body>
</html>
Upvotes: 1
Reputation: 38682
As you're using XQuery 3.0, a quick solution would be to group by entry IDs, which you're resolving anyway:
(: snip :)
for $hyperlemma in $hyperlemmas
let $entry_id := $hyperlemma/ancestor::entry/@xml:id
group by $entry_id
let $lemma := $hyperlemma/ancestor::entry/lemma/orth
let $variant := $hyperlemma/ancestor::entry/variant/orth
return
(: snip :)
A more elegant solution (but pretty much resulting in a complete rewrite of the query) would be to loop over entry elements instead, and for each of those finding the first match and print this.
Upvotes: 1