Doug Gibbs
Doug Gibbs

Reputation: 49

How does one delete millions of documents in Marklogic

We have searched for and tried several ways to delete documents in Marklogic and have not found a reliable way to delete millions of them without having to check if they have in fact been deleted. The way we found that will delete the whole collection is by using the xdmp:collection-delete function but it is REALLY slow. The quickest way we've found is to use the following XQuery. But we must run the XQuery, wait for it to finish, check if there are any documents left in the collection, and repeat a few more times until everything has been deleted.

We are by no means proficient at coding XQuery so we may be missing something.

How does the Marklogic community delete millions of documents quickly and reliably?

Here is the XQuery we use:

xquery version "1.0-ml";
let $page-size := 1000
let $uris := cts:uris('', (), cts:collection-query('OurCollection'))
let $pages := (count($uris) idiv $page-size) + 1
return 
  for $page in (1 to $pages)
  let $start := (($page - 1) * $page-size) + 1 
  let $end := $page * $page-size
  let $uris := subsequence($uris, $start, $end)
  return
    xdmp:spawn-function(function(){
      for $uri in $uris
      return
        xdmp:document-delete($uri)
    });

Upvotes: 1

Views: 390

Answers (1)

Mads Hansen
Mads Hansen

Reputation: 66723

Spawning chunks of deletes like you have shown is one way, and the easiest without any external tools. You could bump up the task server threads to get some more concurrency, but it could still take a while.

Another very common means of performing bulk activities, such as deleting millions of docs, is to run a CoRB job. You can control the number of threads that the job uses for concurrent requests, and send the request across to all E-node using either a load balancer or CoRBs multi-host load balancing feature.

https://help.marklogic.com/Knowledgebase/Article/View/best-practices-for-improving-the-performance-of-large-collections-deletes

Upvotes: 1

Related Questions