Nifi Content Vs Attribute Modification Techniques

Question

In Nifi we can design a flow in two ways :

Content Based Modification (UpdateContent) - In this approach we are directly modifying the content of flowfiles . With this at each stage , the flowfile content will get persisted in flow file repository.

Sample Flow :

ListFile -> FetchFile -> ValidateRecord (sanity) -> UpdateContent -> CSVtoAvro -> AvrotoORC - >PutHDFS

Attribute Based Modification (UpdateAttribute) - In this approach we are storing the contents of the flowfiles in memory as attributes and modifying them directly . Once the updates are done we are writing the attributes to flow file and then merging the flowfiles using MergeContent.

In terms of performance we are getting much better performance in the First case , in the second case many of the processors are slow like ExtractText and specially MergeContent. Having said that I have also done concurrent thread and backpressure level modifications , but still could not achieve better performance.

List File -> FetchFile -> Extract Text -> UpdateAttribute ->AttributeToCSV -> CSVtoAvro -> AvrotoORC -> Mergecontent -> PutHDFS (Rough flow)

I want to understand why attribute approach is less performant and if I am doing something wrong . Please suggest.

We have 200 columns file with all of them treated as attributes for modification. The machine is 32 GB machine with (16GB NIFI) and Quad core Intel Core i7-4771 with HDD Total Size: 500.1GB.

Nifi Content Vs Attribute Modification Techniques

Answers (1)

Little bit of theory

Two possible issues

P.S.

UPD after question update

Related Questions