Keshav Lodhi
Keshav Lodhi

Reputation: 3182

How to control Flink jobs to be distributed/load-balanced properly amongst task-managers in a cluster?

How to control Flink's jobs to be distributed/load-balanced(Evenly or another way where we can set the threshold limit for Free-Slots/Physical MEM/CPU Cores/JVM Heap Size etc..) properly amongst task-managers in a cluster?

For example, I have 3 task-managers in a cluster where one task-manager is heavily loaded even though there are many Free Slots and other resources are available in other task-managers in a cluster.

enter image description here

So if a particular task-manager is heavily loaded then it may cause many problems e.g. Memory issues, heap issues, high back-pressure, Kafka lagging(May slow down the source and sink operation), etc which could lead a container to restart many times.

Note: I may have not mentioned all the possible issues here due to this limitation but in general in distributed systems we should not have such limitations.

Upvotes: 0

Views: 996

Answers (1)

David Anderson
David Anderson

Reputation: 43409

It sounds like cluster.evenly-spread-out-slots is the option you're looking for. See the docs. With this option set to true, Flink will try to always use slots from the least used TM when there aren’t any other preferences. In other words, sources will be placed in the least used TM, and then the rest of the topology will follow (consumers will try to be co-located with their producers, to keep communication local).

This option is only going to be helpful if you have a static set of TMs (e.g., a standalone cluster, rather than a cluster which is dynamically starting and stopping TMs as needed).

For what it's worth, in many ways per-job (or application mode) clusters are easier to manage than session clusters.

Upvotes: 1

Related Questions