P. Paul
P. Paul

Reputation: 403

Should I use sharding/replication on a single machine in elasticsearch?

I have a large dataset in an index in elasticsearch. I have only one physical machine and that is not about to change in near future.

Is there any point in using sharding and/or replication if I can't have more nodes to run elasticsearch on? Will it still improve performance, or should I stick to having just one shard?

Upvotes: 3

Views: 1244

Answers (2)

Juan Carlos Alafita
Juan Carlos Alafita

Reputation: 280

Complementing what Opster said.

Hence even if you try, ES cluster status becomes yellow, because the a replica shard cannot be assigned to the same machine where your primary shard resides. Hence even if you try, all your replica shards increases the unassigned_shards counter

Check the status of your cluster curl -XGET "http://localhost:9200/_cluster/health?pretty"

{
  "cluster_name" : "es-test",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 7,
  "number_of_data_nodes" : 7,
  "discovered_master" : true,
  "active_primary_shards" : 8617,
  "active_shards" : 11975,
  "relocating_shards" : 8,
  "initializing_shards" : 0,
  "unassigned_shards" : 46,
  }

"TIP: The number of shards you can hold on a node will be proportional to the amount of heap you have available, but there is no fixed limit enforced by Elasticsearch. A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better. This will generally help the cluster stay in good health."

https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

Aplicable to primary and replica shards: When you have too many indices, therefore many shards, you start to reach the limit of shards allowed per node. Also take this into account, if you want to modify the primary shards for your new indices. Or if you want to reindex in order to modify the primary shard settings of your existing indices.

Upvotes: 2

Amit
Amit

Reputation: 32386

In a single machine. replication doesn't make sense as its mainly used for high availability(if machine holding another copy goes down) you can still serve requests from machine where replica is hosted, and to provide better search performance, as you search can happen from any replica but in a single machine both these use-cases are not valid, hence even if you try, ES will not allocate replica of same shard on the same node.

Coming to multiple primary shards, its more complicated as it depends on various factor, if you have good disk and RAM available, and have huge amount of data than having a single primary shard means large segment size and segment size more than 5 GB is big and not eligible for segment merging and difficult to cache, on the other hand too many small segments also badly impact the search performance. you should know that ES creates one thread per shard and having more shards of a single index, means more threads from same machine is involved while searching the data. So best is that based on your data, infra you do some benchmarking and choose what is best for your use-case.

Upvotes: 3

Related Questions