medium
medium

Reputation: 4236

Distributing bulk ingest across nodes in elasticsearch 2.4

I am currently running a 10 node Elasticsearch 2.4 cluster and bulk ingesting data using Apache Nifi via the PutElasticsearch processor.

I was curious how Elasticsearch handles ingests (bulk ingests) when it comes to distributing the load out to all of the nodes. If I do a bulk ingest command on just the master node in my Elasticsearch cluster, will that master know to distribute out the ingest load to all of my other nodes in the cluster in say a round robin type strategy?

With respect to Nifi, in my PutElasticsearch processor I have the option to put all of the IP addresses of the Elasticsearch nodes in the Elasticsearch Hosts configuration. Up to this point I have just put the Master Node IP because I assumed it was distributing out the load. Is it worth putting in all IP addresses in your cluster or just the master node?

Upvotes: 0

Views: 348

Answers (1)

Egor
Egor

Reputation: 271

It depends on what you mean by the load to distribute. Essentially the process looks like this:

  1. Client sends request to a coordinator node (it is the node that received request and it can be any node in the cluster, not only master - master role is for different purposes)
  2. The coordinator node figures out shards the documents need to be routed to and on which nodes the shards a hosted and routes the documents to these nodes.
  3. Once primary shard is updated, its host node forwards documents to the nodes hosting replicas.
  4. When the process is completed the coordinator node responds to the client.

So, the indexing work is distributed among nodes hosting target shards and replicas, however, all the coordination is done by the node that received a request. Therefore it may make sense to send requests to different nodes to distribute coordination work.

It is also possible to configure nodes to have particular roles, have a look at the doc

Upvotes: 1

Related Questions