Reputation: 259
Here is the error message:
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (34/40) (95aac9e47f777ddc73c7a29cc1091911) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (35/40) (5181fb35b0a2eab588dd7ed2eb902bbd) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (36/40) (bf4aac9423bdecaeeb7e6ac37001d73d) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (37/40) (31f8ee4d7adbcfd5de21b4cbb83c5e05) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (38/40) (8ba11f69e8e5ee2aacaa276136ad3bd0) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (39/40) (1a1e38ede6b8d398b50b8fe7de2c6cb2) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Sink: Unnamed (40/40) (7fbb095da45b2d2392874fe4fa5c916d) switched from CREATED to SCHEDULED.
2019-10-27 05:37:57,088 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Streaming Job (4e5011eb97e695cfb2d05048534b097a) switched from state RUNNING to FAILING.
2019-10-27 05:37:57,088 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Streaming Job (4e5011eb97e695cfb2d05048534b097a) switched from state RUNNING to FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate all requires slots within timeout of 300000 ms. Slots required: 152, slots allocated: 150, previous allocation IDs: []
my parallelism setting :
source : 32
flatmap : 80
sink : 40
Is the jobManager
try to ask for 152 slots from resourceManager
but rm didn't have enough slots and eventually led to failure. Can't resourceManager
to get more slots from other taskmanagers
when slots are not available any more?
Upvotes: 4
Views: 6061
Reputation: 2921
The number of free slots is numberOfTaskmanagers
x taskmanager.numberOfTaskSlots
(e.g 75 taskmanagers with 2 slots result in 150 slots). Flink itself can't trigger any kind of dynamic scaling. All you can do is to start more task-managers manually or change the task-manager configuration and restart the task-managers.
If your taskmanager dies while the job is running you can define a restart strategy (keep in mind that you need to enable checkpoints for that): https://ci.apache.org/projects/flink/flink-docs-stable/dev/task_failure_recovery.html#restart-strategies
If your taskmanagers dies and doesn't get restarted it is quite likely a yarn issue.
Upvotes: 1