M_Gh
M_Gh

Reputation: 1142

how to distribute data in Kafka cluster with Apache NIFI

I have a cluster of Flink in docker(docker has been installed in three different physical nodes); also, the same three nodes are in Kafka cluster. The picture is in following. I have a problem of distributing data between partitions of the topic. enter image description here

My goal is that I want to distribute data among partitions. I have two options:

  1. Writing a simple program to distribute data.

  2. Using apache Nifi.

Every node has a socket to receive data that data flow in three nodes are the same.

My question is that to distribute data among partitions of the topic, I have to use Apache NiFi(or simple program) in one of the three nodes to prevent having duplicated data in partitions or using Apache Nifi in each node that Nifi,itself, prevent having duplicated data in partitions?

Thank you in advance.

Upvotes: 0

Views: 365

Answers (1)

Bryan Bende
Bryan Bende

Reputation: 18660

You can run a NiFi cluster on multiple nodes, but it is up to you to design the data flow in a way that does not produce duplicate data.

For example, if you run 3 node NiFi cluster and the starting point of your flow is an InvokeHttp processor that retrieves some data using a http GET, and you run this processor on all 3 nodes, then all 3 nodes get the same data and all 3 nodes would publish the same data to Kafka.

If you run the InvokeHttp processor on primary node only then only one node would publish that data.

This is just an example. It depends on your data flow.

Upvotes: 1

Related Questions