Mike at Savient
Mike at Savient

Reputation: 280

Preproces a large file using Nifi

We have files of up to 8GB that contain structured content, but important metadata is stored on the last line of the file which needs to be appended to each line of content. It is easy to use a ReverseFileReader to grab this last line, but that requires the file to be static on disk, and I cannot find a way to do this within our existing Nifi flow? Is this possible before the data is streamed to the content repository?

Upvotes: 1

Views: 952

Answers (1)

Ajay Ahuja
Ajay Ahuja

Reputation: 1323

Processing 8 GB file in Nifi might be inefficient. You may try other option :-

ListSFTP --> ExecuteSparkInteractive --> RouteOnAttributes ----> ....

Here, you don't need to actually flow data through Nifi, Just pass file location ( could be hdfs or non-hdfs location) in nifi attribute and write either pyspark or spark scala code to read that file ( you can run this code through ExecuteSparkInteractive ). Code will be executed on spark cluster and only job result will be sent back to Nifi which you can further use to route your nifi flow (using RouteOnAttribute processor).

Note : You need Livy setup to run spark code from Nifi.

Hope this is helpful.

Upvotes: 2

Related Questions