Parallelism in flink kafka source causes nothing to execute

Question

I'm a beginner kafka and flink enthusiast. I noticed something troubling. When i increase my parallelism of a kafka job to anything more than 1, i get no windows to execute their processes. I wish to use parallelism to increase analysis speed.

Look at the image examples from Apache Flink Web Dashboard which visualizes the issue. This is the exact same code and the exact same ingested data-set, the difference is only parallelism. In the first example the ingested data flows through the window functions, but when parallelism is increased the data just piles up in the first window function which never executes. It stays like this forever and never produces any error.

Parallelism 1, everything flows fine, output from window 1 is sent to window 2 image

Parallelism 10, everything stops at the first window image

The source used in the code is KafkaSource, FlinkKafkaConsumer seems to work fine with the same setup but is deprecated so i wish not to use it.

Thanks for any ideas!

David Anderson · Accepted Answer

The issue (is almost certainly) that the Kafka topic being consumed has fewer partitions than the configured parallelism. The new KafkaSource handles this situation differently than FlinkKafkaConsumer.

An event-time window waits for the arrival of a watermark indicating that the stream is now complete up through the end-time of the window. When your KafkaSource operator has 10 instances, some of which aren't receiving any data, those idle instances are holding back the watermark. Basically, Flink doesn't know that those instances aren't expected to ever produce data -- instead it's waiting for them to be assigned work to do.

You can fix this by doing one of the following:

Reduce Flink's parallelism to be less than or equal to the number of Kafka partitions.
Configure your WatermarkStrategy to use withIdleness(duration) so that the idle instances will recognize that they aren't doing anything, and (temporarily) remove themselves from being involved with watermarking. (And if those instances are ever assigned splits/partitions to consume, they'll resume doing watermarking.)

Parallelism in flink kafka source causes nothing to execute

Answers (1)

Related Questions