Reputation: 37
Is there any processor in Nifi that waits for the all the files to arrive and then put those files into HDFS.
For example: If there are total 5 files to be fetched using SFTP but we received only 3 files, I want NiFi to wait till 5 files arrived and then put those 5 files into HDFS using PUTHDFS.
Thank you for your anwsers
Upvotes: 1
Views: 2188
Reputation: 5463
You can use List*
processors with a Record Writer
and use a MergeRecord
processor to wait for a specific number of files.
Record Writer
attribute. You can use anyone.success
to a MergeRecord
processor with maximum and minimum bin sizes to set to the number of files you want to wait for.merge
relation will have a single flowfile containing the file listing. Split them to individual files and process them.Have a look at Additional Details of ListSFTP processor. It details how you can wait for your batch
to complete process.
Upvotes: 0
Reputation: 2032
The issue is, how do you know all files have arrived? Is it always a static 5 files?
If it is absolutely always 5 files, then just use a MergeContent with a Minimum and Maximum Number of Entries set to 5. This means that all files will wait until there are exactly 5 files waiting to be merge.
But this is very inflexible to change.
Why do you need to wait for all 5 files before you put them into HDFS?
Are you trying to prevent a small files problem?
If so, you don't need to wait for all 5 files, just use a Merge and set a minimum file size to bucket files up to a minimum, with a worst-case time out.
Alternatively, the PutHDFS has a Conflict Resolution Strategy
property which can be set to append
as long as the filename is the same - you can just UpdateAttribute
and set the filename to the same name, and then append the files whenever they arrive.
Upvotes: 1