jay Wong
jay Wong

Reputation: 259

Question about NoResourceAvailableException in Flink

Here is the error message:

2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (34/40) (95aac9e47f777ddc73c7a29cc1091911) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (35/40) (5181fb35b0a2eab588dd7ed2eb902bbd) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (36/40) (bf4aac9423bdecaeeb7e6ac37001d73d) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (37/40) (31f8ee4d7adbcfd5de21b4cbb83c5e05) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (38/40) (8ba11f69e8e5ee2aacaa276136ad3bd0) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (39/40) (1a1e38ede6b8d398b50b8fe7de2c6cb2) switched from CREATED to SCHEDULED.
2019-10-27 05:32:57,087 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Sink: Unnamed (40/40) (7fbb095da45b2d2392874fe4fa5c916d) switched from CREATED to SCHEDULED.
2019-10-27 05:37:57,088 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Flink Streaming Job (4e5011eb97e695cfb2d05048534b097a) switched from state RUNNING to FAILING.
2019-10-27 05:37:57,088 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Flink Streaming Job (4e5011eb97e695cfb2d05048534b097a) switched from state RUNNING to FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate all requires slots within timeout of 300000 ms. Slots required: 152, slots allocated: 150, previous allocation IDs: []

my parallelism setting :

source : 32
flatmap : 80
sink : 40

Is the jobManager try to ask for 152 slots from resourceManager but rm didn't have enough slots and eventually led to failure. Can't resourceManager to get more slots from other taskmanagers when slots are not available any more?

Upvotes: 4

Views: 6061

Answers (1)

TobiSH
TobiSH

Reputation: 2921

The number of free slots is numberOfTaskmanagers x taskmanager.numberOfTaskSlots (e.g 75 taskmanagers with 2 slots result in 150 slots). Flink itself can't trigger any kind of dynamic scaling. All you can do is to start more task-managers manually or change the task-manager configuration and restart the task-managers.

If your taskmanager dies while the job is running you can define a restart strategy (keep in mind that you need to enable checkpoints for that): https://ci.apache.org/projects/flink/flink-docs-stable/dev/task_failure_recovery.html#restart-strategies

If your taskmanagers dies and doesn't get restarted it is quite likely a yarn issue.

Upvotes: 1

Related Questions