cyclic
cyclic

Reputation: 431

Apache NiFi irregular data flow through Remote Process Groups

I am in the process of evaluating Apache NiFi for use in a project. I have four instances of NiFi v1.1.2 running in the cloud on Ubuntu 14 systems. Three of the instances are acting as Remote Process Groups (R1, R2 & R3) and the remaining instance (M1) is used to manage the flow between the RPGs. M1 generates a FlowFile, passes the FlowFile through a pipeline consisting of the three RPGs, and logs the FlowFile at the end. Each RPG simply appends R{id} to a ProcessedBy attribute in the FlowFile so the the order in which the data is processed can be easily seen.

The problem I have is the order is not as expected 100% of the time. I use 2 pipelines (P1 & P2) that traverse the RPGs in the order R1->R2->R3 and R2->R1->R3 respectively. What I am seeing is that ~50% of the time the FlowFile in P1 is not processed by R2, while in P2 it actually reverses direction and is processed by R2 twice so the flow order becomes R2->R1->R2->R3

Edit:

Here is an image of my flow in M1 NiFi Flow

Upvotes: 2

Views: 880

Answers (1)

James
James

Reputation: 11931

I do not believe Remote Process Groups behave with kind of "function semantics" you are expecting. The odd flowfile traffic pattern is happening because flowfiles originating on the left of your flow are emerging from RPG outputs on the right (and the reverse), but this is correct behavior for an RPG output port.

Sending a flowfile to a remote flow from one input port does not guarantee that it will "return" via the output port on the same RPG diagram node. Multiple listeners to a remote output port will each individually receive a share of the outputs. Visually connecting an RPG input with it's output is typical, recommended, and arguably the most self-explanatory way to organize your flow. But it is not required.

You could create different named ports on the remote NiFis to give you more remote input/output options.

I made a sample flow using only two NiFi's, with node1.nifi sending to and receiving from a Remote Process Group on the node2.nifi. I organized the flow to emphasize the potentially disconnected relationship between the RPG input and output ports.

NiFi Remote Process Groups

The three RPG graph nodes all reference the same RPG on node2.nifi, but inputs and outputs are separated. Output is received in two locations, which has resulted in a slightly unequal distribution.

Upvotes: 2

Related Questions