Reputation: 434
I have a number of files going to HDFS and the naming convention is something like this:
I want to use the segment between the underscores as a variable to make the HDFS path, so it will look something like this:
/my/hdfs/directory/sponsor/2019/
I found a way to do this in two steps, but I think there must be a way to do it in one. For the first step, I have an "Update Attribute" processor that creates a variable "file_src" with the following value:
${filename:substringAfter('_')}
So now it sees the filename as being "beneficiary_20190820", etc. After this, I have another "Update Attribute" processor with a variable named "dest" with the following value:
${file_src:substringBefore('_'):toLower()}
so now my hdfs directory can be something like this:
/my/hdfs/directory/${dest}/2019
It works, but it feels clunky. Is there a way to everything in one step? I feel like maybe these expressions could be nested or something. Thanks in advance for any help.
Upvotes: 0
Views: 723
Reputation: 28564
put everything into one expression
${filename:substringAfter('_'):substringBefore('_'):toLower()}
you even could use this expression directly in hdfs directory expression without UpdateAttribute:
/my/hdfs/directory/${filename:substringAfter('_'):substringBefore('_'):toLower()}/2019
Upvotes: 2