Hapeka
Hapeka

Reputation: 80

Expanded tree cache full when performing many queries

I am implementing a batch processing that needs to execute a high number of search queries in MarkLogic to find the documents to be modified. The query which I am using looks like this:

cts:search(/ch:haufe-document,
  cts:and-query((
    cts:element-range-query(fn:QName("http://idesk.haufe-lexware.com/document-meta","rootId"), "=", xs:string($root-id)),
    cts:element-range-query(fn:QName("http://contenthub.haufe-lexware.com/haufe-document","application"), "=", xs:string($APPLICATION-ID))
  ))
)

The $root-id will be different for each query, the $APPLICATION-ID is a constant value. Usually these queries return a small number of documents (less than 10), sometimes up to 150, but they still work fine. Only when many of such queries are executed in a row (could be more than 100000 for one batch job) at some point I get back an error like this:

XDMP-EXPNTREECACHEFULL: cts:search(fn:collection()/ch:haufe-document, cts:and-query((cts:element-range-query(fn:QName("http://idesk.haufe-lexware.com/document-meta","rootId"), "=", "HI14429659", ("collation=http://marklogic.com/collation/"), 1), cts:element-range-query(fn:QName("http://contenthub.haufe-lexware.com/haufe-document","application"), "=", "idesk", ("collation=http://marklogic.com/collation/"), 1)), ())) -- Expanded tree cache full on host some-host.cloudapp.net uri /content/idesk/c9103265-0a44-496b-b2b1-617b0b042208/HI14429659.xml

When I execute the same query manually it runs without problem and returns very few results (just one in most cases). The number of documents matching /ch:haufe-document is about 3 million, but does not change much during the processing (the documents are only modified). The database contains additional 1.5 million documents with meta data, these documents are added during the processing.

The strange thing is that the first two batch jobs, each processing >600000 documents, worked fine. But the third job failed with the error above and since then only very small jobs (~30000 docs) can be processed successfully.

I already tried to increase the size of the expanded tree cache, but it didn't help. I also tried an "unfiltered" search, but the error stays.

I would appreciate any hint to what the problem could be.

Update: One thing I didn't mention, because I didn't realize it might be relevant is this: The whole process is implemented as a REST extension, which is called from a Java application. A POST request is made which contains an XML with a list of document IDs to be processed. And this list can be very long (>100000 entries).

Upvotes: 3

Views: 192

Answers (3)

Hapeka
Hapeka

Reputation: 80

The solution I found is this: I modified the Java application such that it does not send all data to MarkLogic at once, but split it up into chunks of 10000 IDs. Now the error is gone. The downside is that the change is now done in several transactions, so the modifications become visible before everything is done. But for my usecase this is acceptable.

Upvotes: 0

Looking at that size, you are likely not going to solve the issue by increasing the memory. You are essentially trying to inhale the entire database into Memory from the looks of it. More batches equals more stuff in memory in parallell.

Step back and try to figure out what you are trying to accomplish. It would seem to be that whatever is trying to process those results will not be able to do the work all at once, so think about turning back references.

Here is an example to start you off by returning just the URIs. Then the calling code could fetch the docs etc at the time of processing each one(keeping memory usage lower)

cts:uris((),(),cts:element-query(xs:QName(ch:haufe-document),
   cts:and-query((
       cts:element-range-query(fn:QName("http://idesk.haufe-lexware.com/document-meta","rootId"), "=", xs:string($root-id)),
       cts:element-range-query(fn:QName("http://contenthub.haufe-lexware.com/haufe-document","application"), "=", xs:string($APPLICATION-ID))
     ))
   )
 )

I use cts:uris() as an example starting point.

Upvotes: 0

Mads Hansen
Mads Hansen

Reputation: 66781

The query that hits the Expanded Tree Cache Error may not be pulling a lot of documents. It may just be the last straw that broke the camels back.

Resolving XDMP-EXPNTREECACHEFULL errors

When the query needs to actually retrieve elements, values, or otherwise traverse the contents of one of these fragments, the fragment is uncompressed and cached in the expanded tree cache.

Consequently, the expanded tree cache needs to be large enough to maintain a copy of every expanded XML fragment that is simultaneously needed during query processing.

The error message XDMP-EXPNTREECACHEFULL: Expanded tree cache full means that MarkLogic has run out of room in the expanded tree cache during query evaluation, and that consequently it cannot continue evaluating the complete query.

There are a couple of options to handle this, depending upon what your needs and capacity are.

  • If you have enough memory available to allocate, you can bump up the ETC limit and provide more memory to service those requests.
  • If you have found some greedy and inefficient queries that are pulling a ton of docs at once, see if they can be broken out into smaller transactions.
  • If you have too many concurrent transactions processing too many docs, limit the number of appserver threads or lower the thread count on your batch jobs.
  • Configure a maximum readSize limit for auto-cancellation of transactions that exceed those limits: https://docs.marklogic.com/guide/performance/request_monitoring#id_10815

Upvotes: 0

Related Questions