ygk
ygk

Reputation: 600

How to design templates for clustered nifi

Do we need to think about underlying cluster while designing nifi templates?

Here is my simple flow

+-----------------+                         +---------------+                       +-----------------+
|                 |                         |               |                       |                 |
|  READ FROM      |                         |  MERGE        |                       |   PUT HDFS      |
|  KAFKA          |                         |  FILES        |                       |                 |
|                 +-----------------------> |               +---------------------> |                 |
|                 |                         |               |                       |                 |
|                 |                         |               |                       |                 |
|                 |                         |               |                       |                 |
+-----------------+                         +---------------+                       +-----------------+

I have 3 nodes cluster.. When system is running I check "cluster" menu and see only master node is utilizing sources, other cluster nodes seems idle... The question is in such a cluster should I design template according to cluster or nifi should do the load balancing.

I saw one of my colleagues created remote processors for each node on cluster and put a load balancer in front of these within template, is it required? (like below)

                                                                   +------------------+
                                                                   |                  |                 +-------------+
                                                                   | REMOTE PROCESS   |                 |  input port |
                                                            +----> | GROUP FOR        |                 |    (rpg)    |
                                                            |      | NODE 1           |                 +-------------+
                                                            |      |                  |                        |
                                                            |      |                  |                        |
                                                            |      +------------------+                        v
+-----------------+               +-----------------+       RPG
|                 |               |                 |       |                                           +--------------+
|  READ FROM      |               |                 |       |                                           |              |
|  KAFKA          |               | LOAD BALANCER   |       |       +------------------+                | MERGE FILES  |
|                 +-------------> |                 +-------------> |                  |                |              |
|                 |               |                 |       |       |  REMOTE PROCESS  |                |              |
|                 |               |                 |       |       |  GROUP FOR       |                |              |
|                 |               |                 |       |       |  NODE 2          |                |              |
+-----------------+               +-----------------+       RPG     |                  |                +--------------+
                                                            |       +------------------+                       |
                                                            |                                                  |
                                                            |                                                  v
                                                            |
                                                            |       +-------------------+               +---------------+
                                                            |       |                   |               |               |
                                                            |       |   REMOTE PROCESS  |               | PUT HDFS      |
                                                            +-----> |   GROUP FOR       |               |               |
                                                                    |   NODE 3          |               |               |
                                                                    |                   |               |               |
                                                                    |                   |               |               |
                                                                    +-------------------+               +---------------+

And what is the use-case for load-balancer except remote clusters, can I use load-balancer to split traffic into several processors to speedup the operation?

Upvotes: 2

Views: 432

Answers (1)

Bryan Bende
Bryan Bende

Reputation: 18640

Apache NiFi does not do any automatic load balancing or moving of data, so it is up to you to design the data flow in a way that utilizes your cluster. How to do this will depend on the data flow and how the data is being brought into the cluster.

I wrote this article once to try and summarize the approaches:

https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html

In you case with Kafka, you should be able to have the flow run as shown in your first picture (without remote process groups). This is because Kafka is a data source that will allow each node to consume different data.

If ConsumeKafka appears to be running on only one node, there could be a couple of reasons for this...

First, make sure ConsumeKafka is not scheduled for primary node only.

Second, figure out how many partitions you have for your Kafka topic. The Kafka client (used by NiFi) will assign 1 consumer to 1 partition, so if you have only 1 partition then you can only ever have 1 NiFi node consuming from it. Here is an article to further describe this behavior:

http://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka

Upvotes: 3

Related Questions