Reputation: 661
I wonder if Apache Beam.Google DataFlow is smart enough to recognize repeated transformations in the dataflow graph and run them only once. For example, if I have 2 branches:
both will involve grouping elements by key under the hood. Will the execution engine recognize that GroupByKey() has the same input in both cases and run it only once? Or do I need to manually ensure that GroupByKey() in this case proceeds all branches where it gets used?
Upvotes: 2
Views: 393
Reputation: 11041
As you may have inferred, this behavior is runner-dependent. Each runner implements its own optimization logic.
Upvotes: 2