Reputation: 425
If nifi is writing to a disk using PutHDFS or PutFile, and it is killed or process goes down, after writing 4 blocks out of 10. After restoring will nifi write from 5th block onwards or rewrite the whole file again creating duplicates?
Upvotes: 3
Views: 1101
Reputation: 18630
When a processor executes there is a transaction being performed in NiFi...
In your example of PutHDFS or PutFile, the write operation to HDFS or local filesystem would be the "Operations performed" part above.
If NiFi crashes during that point before the session was committed, then when NiFi restarts the flow file will still be in the same queue and the processor will attempt to process it again.
The state of the external systems (i.e. HDFS or local filesystem in this case) depends on on how that external system handles writing data. If the external system has some transaction mechanism, it is possible that if the crash happened before committing the transaction in the external system, then none of the data is visible, but in the case of the filesystem there is probably a partially written file.
In the HDFS processor it writes a temp file where the name starts with a "." so most likely that file still remains in HDFS, and it depends how the processors handles existing files with the same name. Usually there is a strategy property in the processor where you can select a choice like "overwrite" or "fail" on existing files.
Upvotes: 1
Reputation: 1853
NiFi protects against hardware and system failures by keeping a record of what was happening on each node at that time in their respective FlowFile Repo by taking snapshot.
If the Node was in the middle of writing content when it went down,nothing is corrupted,as FlowFile Repo is NiFi’s Write-Ahead Log.When the node comes back online, it works to restore its state by first checking for the "snapshot" and ".partial" files.The node either accepts the "snapshot" and deletes the ".partial" (if it exists), or renames the ".partial" file to "snapshot" if the "snapshot" file doesn’t exist. The period between system checkpoints is configurable in the 'nifi.properties' file default is 2 min.
Hence,it will restore from 5th block onwards.
Thanks.
Upvotes: 1