Varshini
Varshini

Reputation: 69

How to handle small files problem in Nifi

My current flow in Nifi is like

ListHDFS->FetchHDFS->SplitText->JoltransformJSon->PUTHBaseJSON. 

Hourly input JSON files would be max of 10GB. Single file size would be 80 -100MB.

Splittext & JoltTransform -> transform the text and sent it as 4KB files . Hence the hourly job is taking 50 to 1.20 minutest to complete the flow . How can I make this faster. What would be the best flow to handle the use case.

Have tried to use Mergecontent , didnt worked out well .

Thanks All

Upvotes: 0

Views: 604

Answers (1)

notNull
notNull

Reputation: 31520

You can use MergeRecord processor After JoltTransfromJson Processor and

keep your maximum number of records to make flowfile eligible to merge into single flowfile.

Use Max Bin Age property as wildcard to force eligible the bin to be Merged.

Then use record oriented processor for HBase i.e PutHBaseRecord processor and configure your Record Reader controller service(JsonTree Reader) to read the incoming flowfile and tune the Batch size property value to get maximum performance.

By using this process we are processing chunks of records which eventually increase the performance of storing data into HBase.

Flow:

ListHDFS->FetchHDFS->SplitText->JoltransformJSon->MergeRecord ->PUTHBaseRecord

Refer to these links for Merge Record configs and Record Reader configs

Upvotes: 3

Related Questions