Reputation: 13691
I am writing a custom processor, in which I transform the content of the FlowFile. For simplicity in this question, it shall just write the same content.
public void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException {
FlowFile flowFile = session.get();
if (flowFile == null) return;
session.write(
flowFile,
(input, output) -> input.transferTo(output) // Do transfomration here
);
session.transfer(REL_SUCCESS, flowFile);
}
If I am right, what I do there, is replacing the content of the original FlowFile by - in this case the identity of the content.
Now there is this methode
session.create(flowFile);
which creates a new FlowFile.
This made me think, whether I am doing it all wrong, writing in the original FlowFile. Doing it is so simple, so it feels ok. But maybe it would be better to create a new FlowFile?
What are the implications of both ways and when is it correct to chose the one or the other? Or will the observable behavior be the same?
This would be some code, which creates a new FlowFile, writing the content of the original one:
public void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException {
FlowFile flowFile = session.get();
if (flowFile == null) return;
FlowFile newFile = session.create(flowFile);
session.write(
newFile,
output) -> session.read(flowFile).transferTo(output) // Do transfomration here
);
session.transfer(REL_SUCCESS, newFile);
}
Upvotes: 1
Views: 1680
Reputation: 1771
it depends on your requirement, but in general you should write in the original FlowFile.
if you make a new flowfile with session.create()
, a new UUID will be associated to your flowfile and it will make your data provenance messy for nothing.
Making a new flowFile is worth when :
- you want to keep both flowfile, the original and the new one
- read from external source (kafka, hdfs, FS ...)
- 1 input N output (for example splitRecords processor)
Upvotes: 4
Reputation: 28599
Upvotes: 4