Reputation: 317
I need to filter documents by date (last week, last month, etc.) with Marklogic 8. The database contains 1.3 million XML documents.
The documents look like this:
<work datum_gegenereerd="2015-06-10" gegenereerd="2015-06-10T14:28:48" label="gmb-2015-12000">
...
I've created a range element attribute index on work/@datum_gegenereerd (scalar type date).
The following query works but is slow (3 seconds):
xquery version "1.0-ml";
for $a in //work
where xs:date($a/@datum_gegenereerd) > current-date()- 5*xs:dayTimeDuration('P1D')
return
<hit>{base-uri($a)}</hit>
After a lot of experimenting, it turns out that I can get the performance down to 0.02 seconds by removing the xs:date cast from the where statement.
xquery version "1.0-ml";
for $a in //work
where $a/@datum_gegenereerd > current-date()- 5*xs:dayTimeDuration('P1D')
return
<hit>{base-uri($a)}</hit>
Can anyone explain this behaviour?
Update:
when I delete the attribute range index, the performance for the second variant goes down to 3+ seconds as well. And recreating the index brings the performance back up. This makes me wonder how to read David's statement below that there is no way to use a custom index from plain xquery.
(BTW: the query returns 1267 XML documents, out of a possible 450000 documents with root element work in a total database of 1.35 million documents)
Update 2:
I messed up with the performance metric of 0.02 seconds. But it is very fast in the query console. Of the 3 versions, the cts-search seems a tiny bit faster.
Upvotes: 4
Views: 1058
Reputation: 7770
You may have created an index, but you are not using it. You need to use an element-attribute-range-query to find all of the fragments that have dates in the range in question.
something like
cts:search(doc(), cts:element-attribute-range-query(xs:QName("work"), xs:QName("datum_gegenereerd"), ">" current-date()- 5*xs:dayTimeDuration('P1D'))
BUT: if you really just want the URIS, then the element-range-query would be used with cts:uris (sometihng like this - but check the docs)
cts:uris('', (), cts:element-attribute-range-query(xs:QName("work"), xs:QName("datum_gegenereerd"), ">" current-date()- 5*xs:dayTimeDuration('P1D'))
The second one does everything in memory and just pulls the URIs from the URI lexicon that point to document fragments where the date query matches.
Upvotes: 7