kpax
kpax

Reputation: 661

Optimizing repeated transformations in Apache Beam/DataFlow

I wonder if Apache Beam.Google DataFlow is smart enough to recognize repeated transformations in the dataflow graph and run them only once. For example, if I have 2 branches:

both will involve grouping elements by key under the hood. Will the execution engine recognize that GroupByKey() has the same input in both cases and run it only once? Or do I need to manually ensure that GroupByKey() in this case proceeds all branches where it gets used?

Upvotes: 2

Views: 393

Answers (1)

Pablo
Pablo

Reputation: 11041

As you may have inferred, this behavior is runner-dependent. Each runner implements its own optimization logic.

  • The Dataflow Runner does not currently support this optimization.

Upvotes: 2

Related Questions