fuyi
fuyi

Reputation: 2639

How does Apache beam DoFn handle the message when exception occurs?

I have a general question regarding the DoFn. according to this doc:

If required, a fresh instance of the argument DoFn is created on a worker, and the DoFn.Setup method is called on this instance. This may be through deserialization or other means. A PipelineRunner may reuse DoFn instances for multiple bundles. A DoFn that has terminated abnormally (by throwing an Exception) will never be reused.

  1. So the DoFn instance will never be reused in case of exception, then how about the element DoFn is processing? will it be reprocessed by new instance or simply discarded?
  2. If the message gets discarded? Is there any mechanism to recover it?

Upvotes: 2

Views: 872

Answers (1)

robertwb
robertwb

Reputation: 5104

If a DoFn throws an exception, that will abort the bundle and work on that element will be aborted. Generally, with a production-level runner, this work will be re-tried a certain number of times (e.g. for Dataflow batch pipelines, up to 4 times, for Dataflow streaming indefinitely) before the runner gives up and fails the full pipeline.

In no instance is the message silently dropped.

See https://beam.apache.org/documentation/runtime/model/ for a more detailed explanation of what the model requires here.

Upvotes: 2

Related Questions