Exorcismus
Exorcismus

Reputation: 2482

Rename file after putHDFS

I have apache NIFI job where I get file from system using getFile then I use putHDFS, how can I rename the file in HDFS after putting the file in hadoop ? I tried to use executeScript processor but can't get it to work

flowFile = session.get()
if flowFile != None:
    tempFileName= flowFile.getAttribute("filename")
    fileName=tempFileName.replace('._COPYING_','')
    flowFile = session.putAttribute(flowFile, 'filename', fileName)
    session.transfer(flowFile, REL_SUCCESS)

Upvotes: 0

Views: 1937

Answers (2)

notNull
notNull

Reputation: 31470

Instead of using ExecuteScript processor(extra overhead) use UpdateAttribute processor Feed the Success relationship from PutHDFS

Add new property in UpdateAttribute processor as

filename

${filename:replaceAll('<regex_expression>','<replacement_value>')}

Use replaceAll function from NiFi Expression Language.

(or)

Using replace Function

filename

${filename:replaceAll('<search_string>','<replacement_value>')}

enter image description here

NiFi expression language offers different functions to manipulate strings refer to this link for more documentation related to expression language.

i have tried same exact script that in Question with ExecuteScript processor with Script Engine as Python and everything works as expected.

As you are using .replace function and replacing with ''

Output:

enter image description here

As the filename fn._COPYING_ got changed to fn.

Upvotes: 1

Bryan Bende
Bryan Bende

Reputation: 18630

The answer above by Shu is correct for how to manipulate the filename attribute in NiFi, but if you have already written a file to HDFS and then use UpdateAttribute, it is not going to change the name of the file in HDFS, it will only change the value of the filename attribute in NiFi.

You could use the UpdateAttribute approach to create a new attribute called "final.filename" and then use MoveHDFS to move the original file to the final file.

Also of note, the PutHDFS processor already writes a temp file and moves it to the final file so I'm not sure if it is necessary for you to name ".COPYING". For example if you send a flow file to PutHDFS with filename of "foo" it will first write ".foo" to the directory and when done it will move it to "foo".

The only case where you need to use MoveHDFS is if some other process is monitoring the directory and can't ignore the dot files, then you write it somewhere else and use MoveHDFS once it is complete.

Upvotes: 4

Related Questions