Reputation: 2482
I have apache NIFI job where I get file from system using getFile
then I use putHDFS
, how can I rename the file in HDFS after putting the file in hadoop ?
I tried to use executeScript
processor but can't get it to work
flowFile = session.get()
if flowFile != None:
tempFileName= flowFile.getAttribute("filename")
fileName=tempFileName.replace('._COPYING_','')
flowFile = session.putAttribute(flowFile, 'filename', fileName)
session.transfer(flowFile, REL_SUCCESS)
Upvotes: 0
Views: 1937
Reputation: 31470
Instead of using ExecuteScript processor(extra overhead) use UpdateAttribute processor Feed the Success relationship from PutHDFS
Add new property in UpdateAttribute processor as
filename
${filename:replaceAll('<regex_expression>','<replacement_value>')}
Use replaceAll function from NiFi Expression Language.
(or)
Using replace Function
filename
${filename:replaceAll('<search_string>','<replacement_value>')}
NiFi expression language offers different functions to manipulate strings refer to this link for more documentation related to expression language.
i have tried same exact script that in Question with ExecuteScript processor with Script Engine as Python and everything works as expected.
As you are using .replace
function and replacing with ''
Output:
As the filename fn._COPYING_
got changed to fn
.
Upvotes: 1
Reputation: 18630
The answer above by Shu is correct for how to manipulate the filename attribute in NiFi, but if you have already written a file to HDFS and then use UpdateAttribute, it is not going to change the name of the file in HDFS, it will only change the value of the filename attribute in NiFi.
You could use the UpdateAttribute approach to create a new attribute called "final.filename" and then use MoveHDFS to move the original file to the final file.
Also of note, the PutHDFS processor already writes a temp file and moves it to the final file so I'm not sure if it is necessary for you to name ".COPYING". For example if you send a flow file to PutHDFS with filename of "foo" it will first write ".foo" to the directory and when done it will move it to "foo".
The only case where you need to use MoveHDFS is if some other process is monitoring the directory and can't ignore the dot files, then you write it somewhere else and use MoveHDFS once it is complete.
Upvotes: 4