Reputation: 11831
I understand that in the Flink Datastream world parallelism means each slot will get a subset of events [1].
A Flink program consists of multiple tasks (transformations/operators, data sources, and sinks). A task is split into several parallel instances for execution and each parallel instance processes a subset of the task’s input data. The number of parallel instances of a task is called its parallelism.
However, how does that work in the Flink SQL world where you need to do joins between tables? If the events in tables A and B are being processed in parallel, then does this not mean that the slots will have only some events for those tables A and B in any given slot? How does Flink ensure consistency of results irrespective of the parallelism used, or does it just copy all events to all slots, in which case I don't understand how parallelism assists?
Upvotes: 3
Views: 878
Reputation: 1353
Upvotes: 2