Reputation: 804
Right now the query takes around 2 minutes , before I made some changes it took 3:48m.
The xml documents are taken from a web pages , due to it changes each 5m and gives information about the buses in real time.
Could you help me to optimize this query?
xquery version "3.0";
declare namespace bus="http://docs.gijon.es/sw/busgijon.asmx";
declare function local:getNombreParada($numero)
{
for $parada in doc("http://datos.gijon.es/doc/transporte/busgijoninfo.xml")//paradas/bus:parada
where $numero=$parada/bus:idparada
return $parada/bus:descripcion
};
declare function local:getBusesPorLinea($linea)
{
let $numero:=$linea
let $nBuses:=count(doc("http://datos.gijon.es/doc/transporte/busgijontr.xml")//bus:llegada[bus:idlinea=$numero])
return
if($nBuses=0)
then(<p>No hay ningun bus en esta linea</p>)
else(
<div>
<h2>Numero de buses funcionando en la linea {$numero} : {$nBuses}</h2>
<table class="table table-hover">
<thead>
<tr>
<th>Parada</th>
<th>Minutos hasta la llegada</th>
</tr>
</thead>
<tbody>
{
for $l in doc("http://datos.gijon.es/doc/transporte/busgijontr.xml")//bus:llegada[bus:idlinea=$numero]
for $parada in doc("http://datos.gijon.es/doc/transporte/busgijoninfo.xml")//paradas/bus:parada[bus:idparada=$l/bus:idparada]
return <tr>
<td>{$parada/bus:descripcion}</td>
<td>{$l/bus:minutos}</td></tr>
}
</tbody>
</table>
</div>
)
};
local:getBusesPorLinea(1)
PD:i am running this in exist Db
Upvotes: 0
Views: 901
Reputation: 656
Another tip: for queries in eXist-db, it is better to avoid to use the where
clause. An XPath predicate typically performs much better.
Quite a few tips are on http://exist-db.org/exist/apps/doc/tuning.xml?q=performance&field=all&id=D2.2.2#D2.2.6
Upvotes: 1
Reputation: 151
First of all, the best way to optimize queries in eXist is to store the XML locally and with indexes. Please use the built in documentation for how to setup indexes.
However, your code suffers from fetching the same data over and over from the network. Let's take care of that and the other issue, your use of in-memory XML querying, another optimization bottleneck.
The most important first step is to have the XML you're querying in the local database. Queries of nodes in the database are faster and use less memory than queries against in-memory XML nodes. (At least, that was that case with versions up to 2.2.)
So, here is a method to cache the data locally, refreshing the cache after the latest update is more than 5 minutes old.
xquery version "3.0";
declare namespace bus="http://docs.gijon.es/sw/busgijon.asmx";
(: Store the XML data in the collection /db/busgijon/data :)
declare variable $COL := "/db/busgijon/data";
declare variable $INFO-FILE := "busgijoninfo.xml";
declare variable $TR-FILE := "busgijontr.xml";
(: Fetch a page from cache or from web site, updating the cache :)
declare function local:fetchPage($filename) {
(: If the page was fetched more than 5 minutes ago, refresh it :)
let $expire := current-dateTime() - xs:dayTimeDuration('PT5M')
let $page := doc($COL || "/" || $filename)/page
return if (exists($page))
then if ($page/xs:dateTime(@timestamp) ge $expire)
then $page
else (update replace $page/* with doc("http://datos.gijon.es/doc/transporte/" || $filename)/*
, update value $page/@timestamp with current-dateTime()
, $page)
else doc(xmldb:store($COL, $filename, <page timestamp="{current-dateTime()}">{doc("http://datos.gijon.es/doc/transporte/" || $filename)/*}</page>))/page
};
declare function local:getBusesPorLinea($linea)
{
(: Get the two pages from the database cache for querying :)
let $info := local:fetchPage($INFO-FILE)/bus:BusGijonInfo
let $tr := local:fetchPage($TR-FILE)/bus:BusGijonTr
let $numero:=$linea
let $nBuses:=count($tr//bus:llegada[bus:idlinea=$numero])
return
if($nBuses=0)
then(<p>No hay ningun bus en esta linea</p>)
else(
<div>
<h2>Numero de buses funcionando en la linea {$numero} : {$nBuses}</h2>
<table class="table table-hover">
<thead>
<tr>
<th>Parada</th>
<th>Minutos hasta la llegada</th>
</tr>
</thead>
<tbody>
{
(: Loop through the TR page - fetched just once from cache :)
for $l in $tr//bus:llegada[bus:idlinea=$numero]
(: Loop through the Info page - fetched just once from cache :)
for $parada in $info//paradas/bus:parada[bus:idparada=$l/bus:idparada]
return <tr>
<td>{$parada/bus:descripcion}</td>
<td>{$l/bus:minutos}</td></tr>
}
</tbody>
</table>
</div>
)
};
local:getBusesPorLinea(1)
The only part that I changed in the local:getBusesPorLinea function was fetching the two documents at the top from the cache and using those in the embedded loops.
The local:fetchPage function is where most of the speedup occurs. Here's what it does:
The first person to access this XQuery after 5 minutes have passed will have about an extra 5-10 seconds as the cache gets refreshed. This allows the cache to be passive, so you don't have to manually refresh it every five minutes.
Hope this helps.
Upvotes: 2
Reputation: 163262
Without smart optimization, this join expression:
for $l in doc("a.xml")//bus:llegada[bus:idlinea=$numero]
for $parada in doc("b.xml")//paradas/bus:parada[bus:idparada=$l/bus:idparada]
return <tr>...</tr>
is going to have quadratic performance. You haven't told us anything about the document sizes, but that is where I would start looking.
The way you deal with such an issue in an XML database environment is typically by creating appropriate indexes.
Upvotes: 2
Reputation: 9571
Are documents cached? I'm not an expert, but your code seems to access the same document multiple times. That's OK if you're sure the content is cached in the executing environment. Otherwise I would try to declare
declare variable $docinfo := doc("http://datos.gijon.es/doc/transporte/busgijoninfo.xml");
declare variable $doctr := doc("http://datos.gijon.es/doc/transporte/busgijontr.xml");
to make sure the files are fetched just once.
You also scan documents at least twice for same kind of data. I would do that once:
declare variable $paradas := $docinfo//paradas;
declare variable $llegadas := $doctr//bus:llegada;
then only filter the collections:
declare function local:getNombreParada($numero)
{
$paradas/bus:parada[bus:idparada = $numero]/bus:descripcion
};
declare function local:getBusesPorLinea($linea)
{
let $numero:=$linea
let $llegadasNum:=$llegadas[bus:idlinea=$numero]
let $nBuses:=count($llegadasNum)
return
if($nBuses=0)
then(<p>No hay ningun bus en esta linea</p>)
else(
<div>
<h2>Numero de buses funcionando en la linea {$numero} : {$nBuses}</h2>
<table class="table table-hover">
<thead>
<tr>
<th>Parada</th>
<th>Minutos hasta la llegada</th>
</tr>
</thead>
<tbody>
{
for $l in $llegadasNum
for $parada in $paradas/bus:parada[bus:idparada=$l/bus:idparada]
return <tr>
<td>{$parada/bus:descripcion}</td>
<td>{$l/bus:minutos}</td></tr>
}
</tbody>
</table>
</div>
)
};
May be that is not much faster, but I hope it is a bit more readable.
Upvotes: 1