Reputation: 280
We have files of up to 8GB that contain structured content, but important metadata is stored on the last line of the file which needs to be appended to each line of content. It is easy to use a ReverseFileReader to grab this last line, but that requires the file to be static on disk, and I cannot find a way to do this within our existing Nifi flow? Is this possible before the data is streamed to the content repository?
Upvotes: 1
Views: 952
Reputation: 1323
Processing 8 GB file in Nifi might be inefficient. You may try other option :-
ListSFTP --> ExecuteSparkInteractive --> RouteOnAttributes ----> ....
Here, you don't need to actually flow data through Nifi, Just pass file location ( could be hdfs or non-hdfs location) in nifi attribute and write either pyspark or spark scala code to read that file ( you can run this code through ExecuteSparkInteractive ). Code will be executed on spark cluster and only job result will be sent back to Nifi which you can further use to route your nifi flow (using RouteOnAttribute processor).
Note : You need Livy setup to run spark code from Nifi.
Hope this is helpful.
Upvotes: 2