Reputation: 1205
I'm currently using Xquery queries (launch via the API) to extract from Marklogic 8.0.6/
Query in my file extract_data.xqy:
xdmp:save("toto.csv",let $nl := " "
return
document {
for $data in collection("http://book/polar")
return ($data)
})
API call :
$curl --anyauth --user ${MARKLOGIC_USERNAME}:${MARKLOGIC_PASSWORD} -X POST -i -d @extract_data.xqy \
-H "Content-type: application/x-www-form-urlencoded" \
-H "Accept: multipart/mixed; boundary=BOUNDARY" \
$node:$port/v1/eval?database=$db_name
It works fine but I'd like to schedule this extract directly in marklogic and have it running in background to avoid timeout if the request takes too much time to be executed.
Is-there a feature like that to do that ?
Regards, Romain.
Upvotes: 1
Views: 87
Reputation: 20414
As suggested by Mads a tool like CORB can help pull csv data out of MarkLogic.
A schedule as suggested by Michael can trigger a periodic export, and save the output to disk, or push it elsewhere via HTTP. I'd look into running incremental exports in that case though, and I'd also suggest batching things up. In a large cluster, I'd even suggest chunking the export into batches per forest or per host on which content forests are attached. Scheduled tasks allow targeting specific hosts on which they should run.
You can also run an adhoc export, particularly if you batch up the work using a tool like taskbot. And if you combine it with its OPTIONS-SYNC-UPDATE
mode, you can merge multiple batches back into one result file before emitting it as well, and get better performance out of it, compared to running it single threaded. Merging results doesn't scale endlessly, but if you have a relatively small dataset (only a few million small records maybe), that might be sufficient.
HTH!
Upvotes: 0
Reputation: 6651
You can use the task scheduler to setup recurring script execution.
The timeout can be adjusted in the script with xdmp:set-request-time-limit
I would suggest you take a look at MLCP as well.
Upvotes: 4