Reputation: 600
I am trying to write streaming data into elasticsearch with apache-nifi.putElasticSearch processor,
PutElasticSearch has property named "Batch Size", when I set this value to 1 all events are written to elasticsearch ASAP.
But such a low "batch size" obviously not working when the load is high. So in order to have a reasonable throughput I need to set it to 1000.
My question is, does PutElasticSearch waits till the batch size of events available. If yes it can wait hours when there are 999 events waiting on processor.
I am searching to understand how logstash doing same job on elasticsearch output plugin. There may be some flushing logic implemented based on time ( if events are waiting ~2 sec flush events to elasticsearch )..
You have any idea?
Edit: I just found logstash implemented this https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-idle_flush_time :)
How can I do same functionality on nifi
Upvotes: 0
Views: 453
Reputation: 28634
According to the code the batch size
parameter is a maximum number of FlowFiles from the incoming queue.
For example in case of value batch size = 1000
:
1/ if incoming queue contains 1001 flow files - only 1000 will be taken in one transaction.
2/ if incoming queue contains 999 flow files - 999 will be taken in one transaction.
And everything will be processed as soon as there is something in the incoming queue and there are available threads in nifi.
references:
Upvotes: 3