Amitai Fensterheim
Amitai Fensterheim

Reputation: 841

NiFi putFTP not efficient

I have a nifi flow that sends more than 50 files per minute using the putFTP processor. The server has limited resources, but I need to send in a faster pace. I looked at the ftp server logs (not nifi), and my conclusion:

This is not efficient. Is there any way to improve the putFTP? Is this a bug?

Upvotes: 1

Views: 792

Answers (1)

DarkLeafyGreen
DarkLeafyGreen

Reputation: 70466

First question: use run duration

A new ftp connection (session) is created for every file. Is there an option to configure many files on one session? (connect to port 21, authenticate once, and then send many files on different ports)

First, (if it fits your use case) you can use the MergeContent processor to merge multiple (smaller) flow files into one (bigger) flow file and feed it to PutFTP.

Second, the PutFTP processor has the SupportsBatching annotation:

Marker annotation a Processor implementation can use to indicate that users should be able to supply a Batch Duration for the Processor. If a Processor uses this annotation, it is allowing the Framework to batch ProcessSessions' commits, as well as allowing the Framework to return the same ProcessSession multiple times...

Source: https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/annotation/behavior/SupportsBatching.java

Increase the run duration of your PutFTP processor towards more throughput to use the same task for many flow files. You might want to adjust the Maximum Batch Size in the properties tab to accommodate to that change.

enter image description here

Read more about it here:

Second question: inspect source code

When sending one file, many CWD (Change Working Directory) commands are sent. For example, sending file to /myfiles/test/dest/file.txt

By inspecting FTPTransfer.java you can see, that the put method does the following:

  • put -> get client
  • put -> get client -> resetWorkingDirectory -> changeWorkingDirectory(homeDirectory)
  • put -> setAndGetWorkingDirectory

This might be the behavior you discovered.

Upvotes: 1

Related Questions