Reputation: 978
I'm new to nifi
, trying to read files and push to kafka
. From some basic reading, I'm able to do that with the following.
With this flow I'm able to achieve 0.5million records/sec, of size 100kb each. I would like to catch-up to the speed of 2millions/sec. Data from ListFile
and FetchFile
processors through slitText
processors is great. But, getting settled at PublishKafka
.
So clearly bottleneck is with the PublishKafka. How do I improve this performance? Should I tune something at Kafka end or with Nifi-PublishKafka end.
Can someone help me with this. Thanks
Upvotes: 0
Views: 794
Reputation: 31510
You can try using Record Oriented
processors i.e PublishKafkaRecord_1.0
processor.
So that your flow will be:
1.ListFile
2.FetchFile
3.PublishKafkaRecord_1.0 //Configure with more than one concurrent task
By using this flow we are not going to use SplitText
processors and Define RecordReader/Writer
controller services in PublishKafkaRecord
processor.
In addition
you can also distribute the load by using Remote Process Groups
Flow:
1.ListFile
2.RemoteProcessGroup
3.FetchFile
4.PublishKafkaRecord_1.0 //In scheduling tab keep more than one concurrent task
Refer to this link for more details regards to design/configuring the above flow.
Starting from NiFi-1.8 version we don't need to use RemoteProcessGroup
(to distribute the load) as we can configure Connections(relationships) to distribute the load balancing.
Refer to this and NiFi-5516 links for more details regards to these new additions
in NiFi-1.8 version.
Upvotes: 2