I have confusion in the number of tasks that can work in parallel in Flink, Can someone explain to me: what is the number of parallelism in a distributed system? and its relation to Flink terminology In Flink, is it the same as we say 2 parallelism = 2 tasks work in parallel? In Flink, if 2 operators work separately but the number of parallelism in each one of them is 1, does that count as parallel computation? Is it true that in a KeyedStream, the maximum number of parallelism is the number of keys? Does the Current CEP engine in Flink able to work in more than 1 task? Thank you.

parallel-processingapache-flinkflink-streamingflink-cep

Maher Marwani

Reputation: 43

what is the difference between parallelism and parallel computing in Flink?

I have confusion in the number of tasks that can work in parallel in Flink,

Can someone explain to me:

what is the number of parallelism in a distributed system? and its relation to Flink terminology
In Flink, is it the same as we say 2 parallelism = 2 tasks work in parallel?
In Flink, if 2 operators work separately but the number of parallelism in each one of them is 1, does that count as parallel computation?
Is it true that in a KeyedStream, the maximum number of parallelism is the number of keys?
Does the Current CEP engine in Flink able to work in more than 1 task?

Thank you.

Upvotes: 2

Answers (1)

David Anderson

Reputation: 43707

Flink uses the term parallelism in a pretty standard way -- it refers to running multiple copies of the same computation simultaneously on multiple processors, but with different data. When we speak of parallelism with respect to Flink, it can apply to an operator that has parallel instances, or it can apply to a pipeline or job (composed of a several operators).

In Flink it is possible for several operators to work separately and concurrently. E.g., in this job

source ---> map ---> sink

the source, map, and sink could all be running simultaneously in separate processors, but we wouldn't call that parallel computation. (Distributed, yes.)

In a typical Flink deployment, the number of task slots equals the parallelism of the job, and each slot is executing one complete parallel slice of the application. Each parallel instance of an operator chain will correspond to a task. So in the simple example above, the source, map, and sink can all be chained together and run in a single task. If you deploy this job with a parallelism of two, then there will be two tasks. But you could disable the chaining, and run each operator in its own task, in which case you'd be using six tasks to run the job with a parallelism of two.

Yes, with a KeyedStream, the number of distinct keys is an upper bound on the parallelism.

CEP can run in parallel if it is operating on a KeyedStream (in which case, the pattern matching is being done independently for each key).

Upvotes: 1

what is the difference between parallelism and parallel computing in Flink?

Answers (1)

Related Questions