pushpavanthar
pushpavanthar

Reputation: 869

Distribution of content among cluster nodes within edge NiFi processors

I was exploring NiFi documentation. I must agree that it is one of the well documented open-source projects out there.

My understanding is that the processor runs on all nodes of the cluster. However, I was wondering about how the content is distributed among cluster nodes when we use content pulling processors like FetchS3Object, FetchHDFS etc. In processor like FetchHDFS or FetchSFTP, will all nodes make connection to the source? Does it split the content and fetch from multiple nodes or One node fetched the content and load balance it in the downstream queues?

Upvotes: 1

Views: 382

Answers (2)

Bryan Bende
Bryan Bende

Reputation: 18630

The answer by @dagget has traditionally been the approach to handle this situation, often referred to as the "list + fetch" pattern. List processor runs on Primary Node only, listings sent to RPG to re-distribute across the cluster, input port receives listings and connect to a fetch processor running on all nodes fetching in parallel.

In 1.8.0 there are now load balanced connections which remove the need for the RPG. You would still run the List processor on Primary Node only, but then connect it directly to the Fetch processors, and configure the queue in between to load balance.

Upvotes: 1

daggett
daggett

Reputation: 28564

I think this document has an answer to your question:

https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html

enter image description here

For other file stores the idea is the same.

will all nodes make connection to the source?

Yes. If you did not limit your processor to work only on primary node - it runs on all nodes.

enter image description here

Upvotes: 1

Related Questions