Reputation: 49
We have searched for and tried several ways to delete documents in Marklogic and have not found a reliable way to delete millions of them without having to check if they have in fact been deleted. The way we found that will delete the whole collection is by using the xdmp:collection-delete
function but it is REALLY slow. The quickest way we've found is to use the following XQuery. But we must run the XQuery, wait for it to finish, check if there are any documents left in the collection, and repeat a few more times until everything has been deleted.
We are by no means proficient at coding XQuery so we may be missing something.
How does the Marklogic community delete millions of documents quickly and reliably?
Here is the XQuery we use:
xquery version "1.0-ml";
let $page-size := 1000
let $uris := cts:uris('', (), cts:collection-query('OurCollection'))
let $pages := (count($uris) idiv $page-size) + 1
return
for $page in (1 to $pages)
let $start := (($page - 1) * $page-size) + 1
let $end := $page * $page-size
let $uris := subsequence($uris, $start, $end)
return
xdmp:spawn-function(function(){
for $uri in $uris
return
xdmp:document-delete($uri)
});
Upvotes: 1
Views: 390
Reputation: 66723
Spawning chunks of deletes like you have shown is one way, and the easiest without any external tools. You could bump up the task server threads to get some more concurrency, but it could still take a while.
Another very common means of performing bulk activities, such as deleting millions of docs, is to run a CoRB job. You can control the number of threads that the job uses for concurrent requests, and send the request across to all E-node using either a load balancer or CoRBs multi-host load balancing feature.
Upvotes: 1