Reputation: 2913
I'm using the GenerateFlowFile
processor in Apache Nifi - When I activate it, I want the processor to create exactly 1
Flowfile.
Right now I use the REST API via Python to change the state to RUNNING
, wait 0.5
seconds and change the state to STOPPED
. This results in 1
FlowFile being added to the queue to the next processor.
I tested a bit and waiting for 1.5
seconds gives me 2
FlowFiles, 2.5
seconds gives me 3
FlowFiles - I'm guessing the processor generates one Flowfile each second it is running.
How can I ensure that exactly 1
Flowfile is being generated? The above method obviously is dependent on the network connection and roundtrip times. Worst case: the connection drops while I wait and I cannot stop the processor anymore and x Flowfiles are being generated.
My current configs are:
Settings:
Yield duration: 1 sec
Penalty Duration: 30sec
Bulletin Level: WARN
Scheduling:
Scheduling Strategy: CRON driven
Concurrent Tasks: 1
Run Schedule: * * * * * ?
Execution: All nodes
Run duration: 0ms
Properties:
File Size: 0B
Batch Size: 1
Data Format: Text
Unique FlowFiles: false
Custom Text: No value set
Character Set: UTF-8
Mime Type: No value set
Upvotes: 1
Views: 3029
Reputation: 2032
You'll want to flag the GenerateFlowFile as Primary node only (assuming you have more than 1 node) to ensure each node is not generating its own FlowFile.
Set the Scheduling to Timer and whack the run schedule up to something like 604800 (1 week) - this means that it even if you leave the processor running, it's only going to run once a week - that should give you plenty time to fix a connectivity issue if your script can't connect to tell the processor to stop.
Keep concurrency at 1.
Upvotes: 2