Reputation: 533
I have a Flink operator
setup in Kubernetes with 6 task managers. Also, the Kafka topics are created with 6 partitions. I can confirm that, when messages are being published to the Kafka topic, all 6 partitions have fair amount of records distributed. Now, when I submit the Flink job which consumes from the Kafka topic, I always see 1/2 task managers take the processing load and remaining 4/5 are idle.
I have tested this with different messages but the behavior is same
. On restart of Flink operator, I can see a different task manager taking the load, but then other task managers are idle.
Can someone help me how can I fix this behavior?
Thanks in advance.
Upvotes: 0
Views: 556
Reputation: 43697
This sort of skew is most often experienced in cases where there aren't very many distinct keys. In such situations it can easily be the case that the keys used in the keyBy aren't spread out evenly across the task managers. If you can use a KeySelector that produces many more finer-grained keys, that would be one way to solve this.
See https://stackoverflow.com/a/59525969/2000823 for another approach.
Upvotes: 1