TomRobson
TomRobson

Reputation: 99

Bottleneck in NiFi workflow caused by Kafka

I am creating a data ingest workflow in Apache NiFi, using Kafka as a buffering system. I have a 3 node cluster set up running the same workflow, and each node has 4 cores.

I rely on several instances of moving data to and from different Kafka topics, and this is the slowest part of the workflow, and is being very inconsistent in terms of performance, as two identical tests can have up to a 100% duration increase.

Our Publish and Consume Kafka processors are running on all three nodes, and our Kafka topics have 3 partitions accross three brokers.

Does anyone have any idea of what would cause this inconsistency and what I could do to mitigate it and speed up the workflow?

Upvotes: 1

Views: 2694

Answers (1)

Bryan Bende
Bryan Bende

Reputation: 18660

The single biggest performance improvement would be to design your flow so that you have fewer flow files with many messages per flow file, rather than many flow files with one message each.

It is hard to say how to do this for your use-case because I don't know anything about your flow like the format of the data or what you are doing to each message, but lets pretend you have CSV data...The goal would be to have one flow file with many lines of a CSV, rather than one flow file per line of the CSV.

On the publishing side, when you send this flow file to PublishKafka_0_10, you would set the Message Demarcator property to a new-line (using shift+enter) and it will stream each line of the CSV to Kafka.

On the consuming, if you also set the Message Demarcator, then it will write many messages to one flow file, up to a maximum of Max Poll Records.

In addition, you can try tuning the Concurrent Tasks of each processor (found on the scheduling tab) in order to do more publishing or consuming in parallel. There is likely not much benefit to increasing the concurrent tasks on the consuming side since you have 3 partitions and 3 NiFi nodes, so you would already have a thread per partition, but if you had 6 partitions and 3 NiFi nodes then you might benefit from having 2 concurrent tasks.

Upvotes: 5

Related Questions