Reputation: 902
I read through possibly Stackoverflow that es-hadoop / es-spark projects use bulk indexing. If it does is the default batchsize is as per BulkProcessor(5Mb). Is there any configuration to change this.
I am using JavaEsSparkSQL.saveToEs(dataset,index)
in my code and I want to know what are the available configurations available to tune the performance. Is this related to partitioning of dataset also.
Upvotes: 0
Views: 1573
Reputation: 902
Found settings on their configuration page
es.batch.size.bytes (default 1mb)
Size (in bytes) for batch writes using Elasticsearch bulk API. Note the bulk size is allocated per task instance. Always multiply by the number of tasks within a Hadoop job to get the total bulk size at runtime hitting Elasticsearch.
es.batch.size.entries (default 1000)
Size (in entries) for batch writes using Elasticsearch bulk API - (0 disables it). Companion to es.batch.size.bytes, once one matches, the batch update is executed. Similar to the size, this setting is per task instance; it gets multiplied at runtime by the total number of Hadoop tasks running.
Upvotes: 0