thichxai
thichxai

Reputation: 1133

MarkLogic export fail due to time limit exceed

I export huge pdf files some of the pdf over 1GB and also reduce thread_count 4. What's else do I need to do to avoid timeout. Thanks

ERROR contentpump.DatabaseContentReader: RuntimeException reading /pdf/docIns/docIns- 
222581.pdf :com.marklogic.xcc.exceptions.StreamingResultException: RequestException 
instantiating ResultItem 301805: Time limit exceeded
22/01/24 17:48:09 INFO contentpump.DatabaseContentReader: host name: xxx.us- 
central.compute.internal
22/01/24 17:48:09 INFO contentpump.DatabaseContentReader: Retrying connect
22/01/24 17:53:16 INFO contentpump.LocalJobRunner:  completed 3%

Upvotes: 0

Views: 165

Answers (2)

Rob S.
Rob S.

Reputation: 3609

Thread count won't make a difference as each doc can only be read by one thread concurrently. The limiting factor is either network transfer time or time to read the file off MarkLogic's disk and into available memory (or some combination of these factors).

You could try grabbing the document over REST (/v1/documents/ endpoint) and see if that is quicker. You could also use xdmp:zip-create to try and compress it within MarkLogic and see if downloading the compressed file is fast enough.

Alternatively, consider using MarkLogic to store a URL alongside the searchable (meta)data to grab the document from something else (like a CDN or S3 for example).

Upvotes: 1

You could consider increasing the request time limit of the http server. This page explains the settings: https://docs.marklogic.com/admin-help/http-server

If you are managing your cluster via the REST API, you can look here: https://docs.marklogic.com/REST/POST/manage/v2/servers

Also .. there are other options for large Binary content.. you could also consider storing the PDF as a registered binary on a location with external access for clients such as S3.. then just return the reference and your clients could get the file directly assuming that they have credentials to read from the storage. For projects before, I have served large binary from S3 and other times from a different type of server as a proxy using a 1 time token.

Upvotes: 1

Related Questions