Reputation: 434
I am attempting to write a file on disk with the MergeContent processor, but I'm getting significantly varying file sizes - anywhere from one line to 806 lines. I've repeated the process many times over trying to figure out the newline demarcator as addressed in Apache NIFi MergeContent processor - set demarcator as new line and I've gotten really randomly sized files.
What parameters do I need to set to adhere to the following logic?
To fully document, I currently have the following attributes defined:
As you can see, I've set "Max Bin Age" to "10 sec" following the syntax in https://github.com/apache/nifi/blob/31fba6b3332978ca2f6a1d693f6053d719fb9daa/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestMergeContent.java#L219 (which is the only place I've managed to find an example of this value, the documentation seems incomplete on this parameter)
I've set "Maximum Number of Entries" to 5000, and "Maximum number of Bins" to 1
What do I need to do to aggregate my records following the logic above? I also tried using the "Correlation Attribute Name" parameter with an attribute guaranteed to be identical on all documents reaching this point, and saw the same
Upvotes: 7
Views: 6083
Reputation: 339
In case anyone is having this exact issue, the cause may be not setting the schedule on the MergeContent processor. After a lot of troubleshooting, I realized that this is one of those processors where "0 sec" is not an appropriate schedule. I had already set my Min Entries to some high number and Max Entries. Max Bin Age was set to 5 min. It was the schedule that was causing the processor to keep grabbing flowfiles and bundling them up in random sizes.
Upvotes: 0
Reputation: 1633
The most important thing here is actually the minimum number of entries. What is happening is that the binning algorithm takes a lenient approach in terms of the number of items.
For your specific logic, you would want to let things as they stand and:
Below is an image of the configuration above where min and max bin size are both 5000 and only 1 bin is handled at a time. In this case you'll see that exactly 20000 files have been merged into 4.
Upvotes: 7