Reputation: 1624
I'm trying to configure the NiFi SplitText processor (v1.25) for a simple test to split a 10 line text file (a.txt) into 10 one line files (I assume they'll be called a_1.txt, a_2.txt etc). I've created and configured a PutFile processor to receive the files and wired them together.
My config (Properties) for the SplitText processor looks like:
Line Split Count 1
Maximum Fragment Size <No value set>
Header Line Count 0
Header Line Marker characters <No value set>
Remove Trailing Newlines true
When this runs I only see one output file called 'a.txt' containing the first line of the input file (and a % sign at the end).
The relationship between the SplitText processor and the PutFile processor is set to "splits".
Reading the docs, the SplitText processor is described as:
Splits a text file into multiple smaller text files on line boundaries limited by maximum number of lines or total size of fragment. Each output split file will contain no more than the configured number of lines or bytes. If both Line Split Count and Maximum Fragment Size are specified, the split occurs at whichever limit is reached first. If the first line of a fragment exceeds the Maximum Fragment Size, that line will be output in a single split file which exceeds the configured maximum size limit. This component also allows one to specify that each split should include a header lines. Header lines can be computed by either specifying the amount of lines that should constitute a header or by using header marker to match against the read lines. If such match happens then the corresponding line will be treated as header. Keep in mind that upon the first failure of header marker match, no more matches will be performed and the rest of the data will be parsed as regular lines for a given split. If after computation of the header there are no more data, the resulting split will consists of only header lines.
Altering the config so that Line Split Count = 5
(as an example) results again in a single output file (a.txt) but with the first 5 lines of the original file present.
It seems my config is only processing the first n lines and generating a single outfile file (containing those n lines).
How should this be correctly configured?
Upvotes: 0
Views: 306
Reputation: 1624
Thanks @daggett - that helped. Here's the fix:
Add a UpdateAttribute
processor with the following properties:
Delete Attribute Expression = <No Value Set>
Store State = Do not store state
Stateful Variables Initial Value = <No Value Set>
Cache Value Lookup Cache Size = 100
# add a custom property
filename = ${filename}-${fragment.index}
Then connect the SplitText
processor to the newly created UpdateAttribute
processor. Finally, connect the UpdateAttribute
processor to the Putfile
processor (that holds the split files).
Upvotes: 1