Reputation: 3182
How to control Flink's jobs to be distributed/load-balanced(Evenly or another way where we can set the threshold limit for Free-Slots
/Physical MEM
/CPU Cores
/JVM Heap Size
etc..) properly amongst task-managers in a cluster?
For example, I have 3 task-managers in a cluster where one task-manager is heavily loaded even though there are many Free Slots
and other resources are available in other task-managers in a cluster.
So if a particular task-manager is heavily loaded then it may cause many problems e.g. Memory issues
, heap issues
, high back-pressure
, Kafka lagging
(May slow down the source and sink operation), etc which could lead a container to restart many times.
Note: I may have not mentioned all the possible issues here due to this limitation but in general in distributed systems
we should not have such limitations.
Upvotes: 0
Views: 996
Reputation: 43409
It sounds like cluster.evenly-spread-out-slots
is the option you're looking for. See the docs. With this option set to true, Flink will try to always use slots from the least used TM when there aren’t any other preferences. In other words, sources will be placed in the least used TM, and then the rest of the topology will follow (consumers will try to be co-located with their producers, to keep communication local).
This option is only going to be helpful if you have a static set of TMs (e.g., a standalone cluster, rather than a cluster which is dynamically starting and stopping TMs as needed).
For what it's worth, in many ways per-job (or application mode) clusters are easier to manage than session clusters.
Upvotes: 1