How does parallelism works when using Flink SQL?

Question

I understand that in the Flink Datastream world parallelism means each slot will get a subset of events [1].

A Flink program consists of multiple tasks (transformations/operators, data sources, and sinks). A task is split into several parallel instances for execution and each parallel instance processes a subset of the task’s input data. The number of parallel instances of a task is called its parallelism.

However, how does that work in the Flink SQL world where you need to do joins between tables? If the events in tables A and B are being processed in parallel, then does this not mean that the slots will have only some events for those tables A and B in any given slot? How does Flink ensure consistency of results irrespective of the parallelism used, or does it just copy all events to all slots, in which case I don't understand how parallelism assists?

How does parallelism works when using Flink SQL?

Answers (1)

Related Questions