Ranjan Sarma
Ranjan Sarma

Reputation: 1595

Marklogic : Query response time is very high

I have around 15000 records in xml format in a uri, say:"documents/products/specs/*.xml". Each xml is of around 25 Kilo Bytes in size. I connected to this marklogic server uisng a remote Apache Tomcat Server that has a XCC client (Java), which tries to execute a AdHocQuery which resembles something like this:

let $a := cts:uri-match('documents/products/specs/*.xml')
          for $xml in $a
          return fn:doc($a)

(for loop is implemented in java).

This works fine. But for records with larger count, say for 15000 record, it takes 60 minutes, when all the server and internet speed are very good. (Total size of all document in the uri will be around 20 MB which should not take more than 20 Minutes).

Is there any workaround ?

Upvotes: 2

Views: 521

Answers (3)

Charles Foster
Charles Foster

Reputation: 338

Try this:

cts:search(
    fn:doc(),
    cts:document-query(
        cts:uri-match('documents/products/specs/*.xml')
    ), "unfiltered"
)

Upvotes: 2

Tyler Replogle
Tyler Replogle

Reputation: 1339

The reason that the query is taking so long is because the Marklogic server is reading form the disk, for most of those files. Unless you have a really big tree-cache size. What you need to do is lower the scope of your query. Maybe add some indexes to the files.

All that said if all you want to do is ETL that data out then you might want to batch the request.

Upvotes: 0

DALDEI
DALDEI

Reputation: 3732

What you are doing is requesting the full body of ALL documents. This is not a typical query, rather it is a DB dump. The query you show will buffer up all this data, then send it through tomcat which again buffers up all the data and then send it to you. This is a large dataset to be sending in one request.

What is the intent of your query ? If you want to get all documents you should either dump them out using a program like mlcp or fetch them in smaller batches by first collecting the URI's then fetching the documents. This can be speeded up substantially by doing the document fetching in parallel. You can see examples of Java source in xmlsh that shows how to fetch documents in XCC in parallel

http://xmlsh.svn.sourceforge.net/viewvc/xmlsh/extensions/marklogic/src/org/xmlsh/marklogic/get.java?revision=792&view=markup

My guess (correct me if I am wrong) is you are just experimenting and dont actually need all the docs. In which case a more realistic query should be tried.

Upvotes: 1

Related Questions