romain-nio
romain-nio

Reputation: 1205

marklogic 8 :Schedule Xquery extraction

I'm currently using Xquery queries (launch via the API) to extract from Marklogic 8.0.6/

Query in my file extract_data.xqy:

xdmp:save("toto.csv",let $nl := "
"

return
document {

for $data in collection("http://book/polar")
return ($data)
}) 

API call :

$curl --anyauth --user ${MARKLOGIC_USERNAME}:${MARKLOGIC_PASSWORD}  -X POST -i -d @extract_data.xqy \
                -H "Content-type: application/x-www-form-urlencoded" \
                -H "Accept: multipart/mixed; boundary=BOUNDARY" \
                $node:$port/v1/eval?database=$db_name

It works fine but I'd like to schedule this extract directly in marklogic and have it running in background to avoid timeout if the request takes too much time to be executed.

Is-there a feature like that to do that ?

Regards, Romain.

Upvotes: 1

Views: 87

Answers (2)

grtjn
grtjn

Reputation: 20414

As suggested by Mads a tool like CORB can help pull csv data out of MarkLogic.

A schedule as suggested by Michael can trigger a periodic export, and save the output to disk, or push it elsewhere via HTTP. I'd look into running incremental exports in that case though, and I'd also suggest batching things up. In a large cluster, I'd even suggest chunking the export into batches per forest or per host on which content forests are attached. Scheduled tasks allow targeting specific hosts on which they should run.

You can also run an adhoc export, particularly if you batch up the work using a tool like taskbot. And if you combine it with its OPTIONS-SYNC-UPDATE mode, you can merge multiple batches back into one result file before emitting it as well, and get better performance out of it, compared to running it single threaded. Merging results doesn't scale endlessly, but if you have a relatively small dataset (only a few million small records maybe), that might be sufficient.

HTH!

Upvotes: 0

Mike Gardner
Mike Gardner

Reputation: 6651

You can use the task scheduler to setup recurring script execution.

The timeout can be adjusted in the script with xdmp:set-request-time-limit

I would suggest you take a look at MLCP as well.

Upvotes: 4

Related Questions