Vijju
Vijju

Reputation: 37

Wait for all the files arrival-NiFi

Is there any processor in Nifi that waits for the all the files to arrive and then put those files into HDFS.

For example: If there are total 5 files to be fetched using SFTP but we received only 3 files, I want NiFi to wait till 5 files arrived and then put those 5 files into HDFS using PUTHDFS.

Thank you for your anwsers

Upvotes: 1

Views: 2188

Answers (2)

Dakshinamurthy Karra
Dakshinamurthy Karra

Reputation: 5463

You can use List* processors with a Record Writer and use a MergeRecord processor to wait for a specific number of files.

  1. Use a ListSFTP processor. Set the Record Writer attribute. You can use anyone.
  2. Connect the success to a MergeRecord processor with maximum and minimum bin sizes to set to the number of files you want to wait for.
  3. Now the merge relation will have a single flowfile containing the file listing. Split them to individual files and process them.

Have a look at Additional Details of ListSFTP processor. It details how you can wait for your batch to complete process.

Upvotes: 0

Sdairs
Sdairs

Reputation: 2032

The issue is, how do you know all files have arrived? Is it always a static 5 files?

If it is absolutely always 5 files, then just use a MergeContent with a Minimum and Maximum Number of Entries set to 5. This means that all files will wait until there are exactly 5 files waiting to be merge.

But this is very inflexible to change.

Why do you need to wait for all 5 files before you put them into HDFS?

Are you trying to prevent a small files problem?

If so, you don't need to wait for all 5 files, just use a Merge and set a minimum file size to bucket files up to a minimum, with a worst-case time out.

Alternatively, the PutHDFS has a Conflict Resolution Strategy property which can be set to append as long as the filename is the same - you can just UpdateAttribute and set the filename to the same name, and then append the files whenever they arrive.

Upvotes: 1

Related Questions