Maher Marwani
Maher Marwani

Reputation: 43

what is the difference between parallelism and parallel computing in Flink?

I have confusion in the number of tasks that can work in parallel in Flink,

Can someone explain to me:

Thank you.

Upvotes: 2

Views: 2852

Answers (1)

David Anderson
David Anderson

Reputation: 43707

Flink uses the term parallelism in a pretty standard way -- it refers to running multiple copies of the same computation simultaneously on multiple processors, but with different data. When we speak of parallelism with respect to Flink, it can apply to an operator that has parallel instances, or it can apply to a pipeline or job (composed of a several operators).

In Flink it is possible for several operators to work separately and concurrently. E.g., in this job

source ---> map ---> sink

the source, map, and sink could all be running simultaneously in separate processors, but we wouldn't call that parallel computation. (Distributed, yes.)

In a typical Flink deployment, the number of task slots equals the parallelism of the job, and each slot is executing one complete parallel slice of the application. Each parallel instance of an operator chain will correspond to a task. So in the simple example above, the source, map, and sink can all be chained together and run in a single task. If you deploy this job with a parallelism of two, then there will be two tasks. But you could disable the chaining, and run each operator in its own task, in which case you'd be using six tasks to run the job with a parallelism of two.

Yes, with a KeyedStream, the number of distinct keys is an upper bound on the parallelism.

CEP can run in parallel if it is operating on a KeyedStream (in which case, the pattern matching is being done independently for each key).

Upvotes: 1

Related Questions