Roberto Fernandez Diaz
Roberto Fernandez Diaz

Reputation: 804

Optimize slow query xquery

Right now the query takes around 2 minutes , before I made some changes it took 3:48m.

The xml documents are taken from a web pages , due to it changes each 5m and gives information about the buses in real time.

Could you help me to optimize this query?

xquery version "3.0";
declare namespace bus="http://docs.gijon.es/sw/busgijon.asmx";

declare function local:getNombreParada($numero)
{
    for $parada in doc("http://datos.gijon.es/doc/transporte/busgijoninfo.xml")//paradas/bus:parada
    where $numero=$parada/bus:idparada
    return $parada/bus:descripcion
};

declare function local:getBusesPorLinea($linea)
{

    let $numero:=$linea
    let $nBuses:=count(doc("http://datos.gijon.es/doc/transporte/busgijontr.xml")//bus:llegada[bus:idlinea=$numero])

    return 
    if($nBuses=0)
    then(<p>No hay ningun bus en esta linea</p>)
    else(
    <div>
        <h2>Numero de buses funcionando en la linea {$numero} : {$nBuses}</h2>

    <table class="table table-hover">
        <thead>
          <tr>
            <th>Parada</th>
            <th>Minutos hasta la llegada</th>
          </tr>
        </thead>
        <tbody>
            {
            for $l in doc("http://datos.gijon.es/doc/transporte/busgijontr.xml")//bus:llegada[bus:idlinea=$numero]
                for $parada in doc("http://datos.gijon.es/doc/transporte/busgijoninfo.xml")//paradas/bus:parada[bus:idparada=$l/bus:idparada]


            return <tr>
                        <td>{$parada/bus:descripcion}</td>
                        <td>{$l/bus:minutos}</td></tr>
            }
        </tbody>
    </table>

    </div>
    )


};

local:getBusesPorLinea(1)

PD:i am running this in exist Db

Upvotes: 0

Views: 901

Answers (4)

DiZzZz
DiZzZz

Reputation: 656

Another tip: for queries in eXist-db, it is better to avoid to use the where clause. An XPath predicate typically performs much better.

Quite a few tips are on http://exist-db.org/exist/apps/doc/tuning.xml?q=performance&field=all&id=D2.2.2#D2.2.6

Upvotes: 1

westbaystars
westbaystars

Reputation: 151

First of all, the best way to optimize queries in eXist is to store the XML locally and with indexes. Please use the built in documentation for how to setup indexes.

However, your code suffers from fetching the same data over and over from the network. Let's take care of that and the other issue, your use of in-memory XML querying, another optimization bottleneck.

The most important first step is to have the XML you're querying in the local database. Queries of nodes in the database are faster and use less memory than queries against in-memory XML nodes. (At least, that was that case with versions up to 2.2.)

So, here is a method to cache the data locally, refreshing the cache after the latest update is more than 5 minutes old.

xquery version "3.0";

declare namespace bus="http://docs.gijon.es/sw/busgijon.asmx";

(: Store the XML data in the collection /db/busgijon/data :)
declare variable $COL := "/db/busgijon/data";
declare variable $INFO-FILE := "busgijoninfo.xml";
declare variable $TR-FILE := "busgijontr.xml";

(: Fetch a page from cache or from web site, updating the cache :)
declare function local:fetchPage($filename) {
    (: If the page was fetched more than 5 minutes ago, refresh it :)
    let $expire := current-dateTime() - xs:dayTimeDuration('PT5M')
    let $page := doc($COL || "/" || $filename)/page
    return if (exists($page))
        then if ($page/xs:dateTime(@timestamp) ge $expire)
            then $page
            else (update replace $page/* with doc("http://datos.gijon.es/doc/transporte/" || $filename)/*
                , update value $page/@timestamp with current-dateTime()
                , $page)
        else doc(xmldb:store($COL, $filename, <page timestamp="{current-dateTime()}">{doc("http://datos.gijon.es/doc/transporte/" || $filename)/*}</page>))/page
};

declare function local:getBusesPorLinea($linea)
{
    (: Get the two pages from the database cache for querying :)
    let $info := local:fetchPage($INFO-FILE)/bus:BusGijonInfo
    let $tr := local:fetchPage($TR-FILE)/bus:BusGijonTr

    let $numero:=$linea
    let $nBuses:=count($tr//bus:llegada[bus:idlinea=$numero])

    return 
    if($nBuses=0)
    then(<p>No hay ningun bus en esta linea</p>)
    else(
    <div>
        <h2>Numero de buses funcionando en la linea {$numero} : {$nBuses}</h2>

    <table class="table table-hover">
        <thead>
          <tr>
            <th>Parada</th>
            <th>Minutos hasta la llegada</th>
          </tr>
        </thead>
        <tbody>
            {
            (: Loop through the TR page - fetched just once from cache :)
            for $l in $tr//bus:llegada[bus:idlinea=$numero]
                (: Loop through the Info page - fetched just once from cache :)
                for $parada in $info//paradas/bus:parada[bus:idparada=$l/bus:idparada]


            return <tr>
                        <td>{$parada/bus:descripcion}</td>
                        <td>{$l/bus:minutos}</td></tr>
            }
        </tbody>
    </table>

    </div>
    )


};

local:getBusesPorLinea(1)

The only part that I changed in the local:getBusesPorLinea function was fetching the two documents at the top from the cache and using those in the embedded loops.

The local:fetchPage function is where most of the speedup occurs. Here's what it does:

  • Set expiration time to 5 minutes in the past.
  • Try to fetch the specified page from the cache.
  • If the page exists, compare the fetched timestamp to the expiration timestamp.
  • If the page's timestamp is less than 5 minutes ago (greater than the expiration timestamp), return that page.
  • If the page's timestamp is greater than 5 minutes ago, refetch it, updating the contents of the page with the refreshed document, update the page's timestamp, and return the new page.
  • If the page does not yet exist, save the page to the specified collection with the current timestamp, returning the page element.

The first person to access this XQuery after 5 minutes have passed will have about an extra 5-10 seconds as the cache gets refreshed. This allows the cache to be passive, so you don't have to manually refresh it every five minutes.

Hope this helps.

Upvotes: 2

Michael Kay
Michael Kay

Reputation: 163262

Without smart optimization, this join expression:

for $l in doc("a.xml")//bus:llegada[bus:idlinea=$numero]
  for $parada in doc("b.xml")//paradas/bus:parada[bus:idparada=$l/bus:idparada]
return <tr>...</tr>

is going to have quadratic performance. You haven't told us anything about the document sizes, but that is where I would start looking.

The way you deal with such an issue in an XML database environment is typically by creating appropriate indexes.

Upvotes: 2

CiaPan
CiaPan

Reputation: 9571

Are documents cached? I'm not an expert, but your code seems to access the same document multiple times. That's OK if you're sure the content is cached in the executing environment. Otherwise I would try to declare

declare variable $docinfo := doc("http://datos.gijon.es/doc/transporte/busgijoninfo.xml");
declare variable $doctr   := doc("http://datos.gijon.es/doc/transporte/busgijontr.xml");

to make sure the files are fetched just once.

You also scan documents at least twice for same kind of data. I would do that once:

declare variable $paradas  := $docinfo//paradas;
declare variable $llegadas := $doctr//bus:llegada;

then only filter the collections:

declare function local:getNombreParada($numero)
{
    $paradas/bus:parada[bus:idparada = $numero]/bus:descripcion
};

declare function local:getBusesPorLinea($linea)
{
    let $numero:=$linea
    let $llegadasNum:=$llegadas[bus:idlinea=$numero]
    let $nBuses:=count($llegadasNum)

    return 

    if($nBuses=0)
    then(<p>No hay ningun bus en esta linea</p>)
    else(
    <div>
        <h2>Numero de buses funcionando en la linea {$numero} : {$nBuses}</h2>

    <table class="table table-hover">
        <thead>
          <tr>
            <th>Parada</th>
            <th>Minutos hasta la llegada</th>
          </tr>
        </thead>
        <tbody>
            {
            for $l in $llegadasNum
                for $parada in $paradas/bus:parada[bus:idparada=$l/bus:idparada]
                return <tr>
                        <td>{$parada/bus:descripcion}</td>
                        <td>{$l/bus:minutos}</td></tr>
            }
        </tbody>
    </table>

    </div>
    )
};

May be that is not much faster, but I hope it is a bit more readable.

Upvotes: 1

Related Questions