smo
smo

Reputation: 89

XQuery: How to retrieve the contents of an element once only despite multiple occurrences of a searchterm?

I have an XML file with a bunch of <entry> elements in it (see below). I would like to extract most of the informations given in the <entry> container and put them in a (X)HTML document.

I'm able to perform a search and get the wanted element contents. If I search for the term ἄγγελος either in entry/hyperlemma/orth (path A) or in cit/hyperlemma/orth (path B), it is found once in entry01 in path A and twice in entry02 in path B.

The idea is that I print the content of each entry container where ἄγγελος was found, regardless of the amount of occurrences. As the term was found in entry02 twice, the entry gets (of course) printed twice, but I only need it once. Would that be possible to do with XQuery? And if so, how would I do that?

My XML:

<text>
    <entry xml:id="01">
        <hyperlemma>ἄγγελος</hyperlemma>
        <lemma>ἄγγελος</lemma>
        <variant>τῶν ἀγγέλων
            <hyperlemma>
                <orth>ἄγγελος</orth>
            </hyperlemma>
        </variant>
    </entry>
    <entry xml:id="02">
        <hyperlemma>
            <orth>ангелъ</orth>
        </hyperlemma>
        <lemma>
            <orth>ангелъ</orth>
        </lemma>
        <variant>
            <orth>анг꙯ла</orth>
            <hyperlemma>
                <orth>ангелъ</orth>
            </hyperlemma>
            <cit>
                <hyperlemma>
                    <orth>ἄγγελος</orth>
                </hyperlemma>
                <lemma>
                    <orth>ἄγγελον</orth>
                </lemma>
            </cit>
        </variant>
        <variant>
            <orth>анг꙯лъ</orth>
            <hyperlemma>
                <orth>ангелъ</orth>
            </hyperlemma>
            <cit>
                <hyperlemma>
                    <orth>ἄγγελος</orth>
                </hyperlemma>
                <lemma>
                    <orth>ἄγγελος</orth>
                </lemma>
            </cit>
        </variant>
    </entry>
</text>

My XQuery:

xquery version "3.0";
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "xml";
declare variable $searchphrase := "ἄγγελος";
<html>
    <head>
        <meta HTTP-EQUIV="Content-Type" content="text/html; charset=UTF-8"/>
    </head>
    <body>
        <h1>Output of searchterm</h1>
        <p>You are looking for "<font color="red"><strong>{$searchphrase}</strong></font>"</p>
        {
        let $hyperlemmas := doc("sample_entry.xml")/(descendant::entry | descendant::cit)/hyperlemma/orth [contains(., $searchphrase)]
        return
        <p>{$searchphrase} was found {count($hyperlemmas)} times.</p>
        }
        {
        let $hyperlemmas := doc("sample_entry.xml")/(descendant::entry | descendant::cit)/hyperlemma/orth [contains(., $searchphrase)]
        for $hyperlemma in $hyperlemmas
        let $entry_id := $hyperlemma/ancestor::entry/@xml:id
        let $lemma := $hyperlemma/ancestor::entry/lemma/orth
        let $variant := $hyperlemma/ancestor::entry/variant/orth
        return
        <div>
            Entry {string($entry_id)}:<br/>
            Lemma: {$lemma} //
            {
            for $form in $variant
            return
            <i>{$form}</i>
            }
        </div>      
        }
    </body>
</html>

Upvotes: 1

Views: 269

Answers (2)

smo
smo

Reputation: 89

I finally figured out myself how I can just print certain elements inside of the entry tag when a given searchterm can be found at different postions. Here is the rewritten XQuery code that (for now) works for me and gives the intended results:

xquery version "3.0";
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method   "xml";
declare variable $searchphrase := "ἄγγελος";
<html>
    <head>
        <meta HTTP-EQUIV="Content-Type" content="text/html; charset=UTF-8"/>
    </head>
    <body>
        <h1>Output of searchterm</h1>
        <p>You are looking for "<font color="red"><strong>{$searchphrase}</strong></font>"</p>
        {
        let $hyperlemmas := doc("sample_entry.xml")/(descendant::entry | descendant::cit)/hyperlemma/orth [contains(., $searchphrase)]
        let $ids := $hyperlemmas/ancestor::entry/@xml:id
        return
        <p>{$searchphrase} was found {count($hyperlemmas)} times. IDs: {data($ids)} </p>
        }
        {   
        let $entry_base := doc("sample_entry.xml")/text

        for $entry in $entry_base/entry
        let $id := $entry/@xml:id
        let $variant := $entry/variant/orth
        let $found_pos1 := $entry/hyperlemma/orth
        let $found_pos2 := $entry/descendant::cit/hyperlemma/orth
        where $found_pos1 = $searchphrase or $found_pos2 = $searchphrase
        return
        <div>ID {data($id)}:<br/>Lemma: {$entry/lemma/orth}<br/>
            {
            for $item in $variant
            return
            <div>Variant: {$item}
                {
                for $cit in $item/../cit/lemma
                where exists($cit)
                return
                <i>-> {$cit}</i>
                }
            </div>
            }
        </div>
        }
    </body>
</html>

Upvotes: 1

Jens Erat
Jens Erat

Reputation: 38682

As you're using XQuery 3.0, a quick solution would be to group by entry IDs, which you're resolving anyway:

(: snip :)
for $hyperlemma in $hyperlemmas
let $entry_id := $hyperlemma/ancestor::entry/@xml:id
group by $entry_id
let $lemma := $hyperlemma/ancestor::entry/lemma/orth
let $variant := $hyperlemma/ancestor::entry/variant/orth
return
  (: snip :)

A more elegant solution (but pretty much resulting in a complete rewrite of the query) would be to loop over entry elements instead, and for each of those finding the first match and print this.

Upvotes: 1

Related Questions