David542
David542

Reputation: 110163

Possible to index 1M docs/sec in ElasticSearch?

I am trying to optimize indexing speed in ElasticSearch, as we are constantly reindexing indexes every hour, and so the faster we are able to re-index our data, the less of a lag we can achieve.

I came across this article which talks about reaching a re-indexing throughput of 100K: https://thoughts.t37.net/how-we-reindexed-36-billions-documents-in-5-days-within-the-same-elasticsearch-cluster-cd9c054d1db8#.4w3kl9ebf, and this StackOverflow question which achieves higher: ElasticSearch - high indexing throughput.

My question is whether it is possible to achieve a sustained indexing throughput of 1 million documents per second, and if so, how?

Upvotes: 1

Views: 1259

Answers (1)

miku
miku

Reputation: 188024

It will depend on a few factors, but why should it be impossible? Here are a few key factors, that will speed up the indexing process:

  • size of the documents (smaller is faster)
  • number of cores and size of memory (more is faster)
  • number of machines (more is faster)
  • number of replicas (fewer is faster)

As an example, with small documents and a single eight core machine, I was able to index at about 70k-120k docs/s. Throw in a few more cores or machines and you could approach 1M docs/s.


Update: Another test run with Elasticsearch 6.1.0, on a single 32-core E5, with 64G JVM heap. Here, esbulk could index about 330000 docs/s, using 10M small documents of sizes 20-40 bytes.


Disclaimer: I wrote esbulk. The README contains a few measurements - maximum at the moment is at about 300k docs/s.

Upvotes: 2

Related Questions