srikanth
srikanth

Reputation: 978

How to speed up the Nifi streaming logs to Kafka

I'm new to nifi, trying to read files and push to kafka. From some basic reading, I'm able to do that with the following.enter image description here

With this flow I'm able to achieve 0.5million records/sec, of size 100kb each. I would like to catch-up to the speed of 2millions/sec. Data from ListFile and FetchFile processors through slitText processors is great. But, getting settled at PublishKafka.

So clearly bottleneck is with the PublishKafka. How do I improve this performance? Should I tune something at Kafka end or with Nifi-PublishKafka end.

Can someone help me with this. Thanks

Upvotes: 0

Views: 794

Answers (1)

notNull
notNull

Reputation: 31510

You can try using Record Oriented processors i.e PublishKafkaRecord_1.0 processor.

So that your flow will be:

1.ListFile
2.FetchFile
3.PublishKafkaRecord_1.0 //Configure with more than one concurrent task

By using this flow we are not going to use SplitText processors and Define RecordReader/Writer controller services in PublishKafkaRecord processor.

In addition you can also distribute the load by using Remote Process Groups

Flow:

1.ListFile
2.RemoteProcessGroup
3.FetchFile
4.PublishKafkaRecord_1.0 //In scheduling tab keep more than one concurrent task

Refer to this link for more details regards to design/configuring the above flow.

Starting from NiFi-1.8 version we don't need to use RemoteProcessGroup (to distribute the load) as we can configure Connections(relationships) to distribute the load balancing.

Refer to this and NiFi-5516 links for more details regards to these new additions in NiFi-1.8 version.

Upvotes: 2

Related Questions