user3794298
user3794298

Reputation: 21

Performance issue for batch insertion into marklogic

I have the requirement to insert 10,000 docs into marklogic in less than 10 seconds.

I tested in one single-node marklogic server in the following way:

  1. use xdmp:spawn to pass the doc insertion task to task server;
  2. use xdmp:document-insert without specify forest explicitly;
  3. the task server has 8 theads to process tasks;
  4. We have enabled CPF.

The performance is very bad: it took 2 minutes to finish the 10,000 doc creation. I'm sure the performance will be better if I tested it in a cluster environment, but I'm not sure whether it can finish in less than 10 seconds.

Please advise the way of improving the performance.

Upvotes: 2

Views: 343

Answers (3)

Aaron Rosenbaum
Aaron Rosenbaum

Reputation: 61

Assuming 2 socket server, 128GB-256GB of ram, fast IO (400-800MB/sec sustained)

  • Appropriate number of forests (12 primary or 6 primary/6 secondary)
  • More than 8 threads assuming enough cores
  • CPF off

Turn on perf history, look in metrics, and you will see where the bottleneck is.

SSD is not required - just IO throughput...which multiple spinning disks provide without issue.

Upvotes: 1

grtjn
grtjn

Reputation: 20414

If you need a fast load, I wouldn't use xdmp:spawn for each individual document, nor use CPF. But 2 minutes for 10k docs doesn't necessarily sound slow. On the other hand, I have reached up to 3k/sec, but without range indexes, transforms, whatsoever. And a very fast disk (e.g. ssd)..

HTH!

Upvotes: 1

mblakele
mblakele

Reputation: 7840

I would start by gathering more information. What version of MarkLogic is this? What OS is it running on? What's the CPU? RAM? What's the storage subsystem? How many forests are attached to the database?

Then gather OS-level metrics, to see if one of the subsystems is an obvious bottleneck. For now I won't speculate beyond that.

Upvotes: 1

Related Questions