Reputation: 381
I have a basic elasticsearch cluster at the moment in which I am using a river to index data. I want to scale for future growth in two phases. Number of documents indexed per second is what could be the bottleneck.
How should I go about it?
Thanks-in-advance!
Edit:
I am trying to index the Twitter stream.
Each document = around 2 KB.
Hardware is flexible. Right now I have magnetic disks (with 50 GB RAM) but getting SSD (and better config) is no biggie.
Upvotes: 0
Views: 622
Reputation: 663
A few highlights that come from experiments and articles:
Since you will do a lot of writing, make sure you start with a good number of primary shards. You can make that decision based on the number of nodes you will have/need. Basically, you want to make sure that your primary shards are distributed on different nodes so they can share the work. You can't change the number of primary shards once your index is created, so think it out.
Do not assign more than 50% of your machine's memory to ES. The rest will be used by Lucene (see http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/heap-sizing.html)
Use a SSD. When indexing, I/O plays a big role (see http://www.elasticsearch.org/blog/performance-considerations-elasticsearch-indexing/)
Generally: I/O > Memory > Multiple CPU Cores > Fast single CPU (see http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/hardware.html)
Pretty much each setup is unique, so the best way to find out what the optimal configurations are for you is to try it out. Elasticsearch has a great monitoring tool called Marvel (http://www.elasticsearch.org/overview/marvel/)
Have fun !
Upvotes: 1