derM
derM

Reputation: 13691

Creating new FlowFile or reusing old one

I am writing a custom processor, in which I transform the content of the FlowFile. For simplicity in this question, it shall just write the same content.

public void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException {
  FlowFile flowFile = session.get();
  if (flowFile == null) return;

  session.write(
             flowFile,
             (input, output) -> input.transferTo(output) // Do transfomration here
  );
  session.transfer(REL_SUCCESS, flowFile);
}

If I am right, what I do there, is replacing the content of the original FlowFile by - in this case the identity of the content.

Now there is this methode

session.create(flowFile);

which creates a new FlowFile.

This made me think, whether I am doing it all wrong, writing in the original FlowFile. Doing it is so simple, so it feels ok. But maybe it would be better to create a new FlowFile?
What are the implications of both ways and when is it correct to chose the one or the other? Or will the observable behavior be the same?

This would be some code, which creates a new FlowFile, writing the content of the original one:

public void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException {
  FlowFile flowFile = session.get();
  if (flowFile == null) return;

  FlowFile newFile = session.create(flowFile);
  session.write(
             newFile,
             output) -> session.read(flowFile).transferTo(output) // Do transfomration here
  );
  session.transfer(REL_SUCCESS, newFile);
}

Upvotes: 1

Views: 1680

Answers (2)

maxime G
maxime G

Reputation: 1771

it depends on your requirement, but in general you should write in the original FlowFile.

if you make a new flowfile with session.create() , a new UUID will be associated to your flowfile and it will make your data provenance messy for nothing.

Making a new flowFile is worth when :
- you want to keep both flowfile, the original and the new one
- read from external source (kafka, hdfs, FS ...)
- 1 input N output (for example splitRecords processor)

Upvotes: 4

daggett
daggett

Reputation: 28599

  • session.write(flowFile, callback) - Executes the given callback against the content corresponding to the given flowFile.
    • Useful when you don't need the original content.
  • session.create() - Creates a new FlowFile in the repository with no content and without any linkage to a parent FlowFile.
    • This method is appropriate only when data is received or created from an external system.
  • session.create(parentFlowFile) - Creates a new FlowFile in the repository with no content but with a parent linkage to parentFlowFile. The newly created FlowFile will inherit all of the parent's attributes except for the UUID.
    • Use when you want to output both - the original flowfile and the new one.
    • Or you want to create multiple flow files from input.

Upvotes: 4

Related Questions