Reputation: 869
I have deployed Flink cluster with configuration for parallelism as follows:
jobmanager.heap.mb: 2048
taskmanager.heap.mb: 2048
taskmanager.numberOfTaskSlots: 5
parallelism.default: 2
But if I try to run any example or jar even with -p
flag I receive the following error:
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration.
Task to schedule: < Attempt #1 (Source: Custom Source -> Sink: Unnamed (1/1)) @ (unassigned) - [SCHEDULED] > with groupID < 22f48c24254702e4d1674069e455c81a > in sharing group < SlotSharingGroup [22f48c24254702e4d1674069e455c81a] >. Resources available to scheduler:
Number of instances=0, total number of slots=0, available slots=0
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:255)
at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:131)
at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:303)
at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:453)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:326)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:742)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:889)
at org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.call(FixedDelayRestartStrategy.java:80)
at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Which should not come as a surprise, as dashboard shows:
I tried restarting a cluster for several times, but it seems not to use the configuration.
Upvotes: 11
Views: 13343
Reputation: 1932
Finally I got solution for this FLINK issue is my case. First I will explain the Root cause and then explain solution.
Please Check the Flink logs and tail the task-executor logs
tail -500f flink-root-taskexecutor-3-osboxes.out Found following logs.
Invalid maximum direct memory size: -XX:MaxDirectMemorySize=8388607T
The specified size exceeds the maximum representable size.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Why this is coming because Java version is not correct. OS is 64 bit based but I installed jdk 32 bits.
Solution: 1. Install correct JDK-1.8 64 bit [After installation error in task-executor got disappear]
My problem got resolved and Flink cluster running perfectly in local and on cloud.
Upvotes: 0
Reputation: 131
I got the same issue, and I remember when to Spark running into problems, that's because I newly installed JDK11 for tests, that change my env var JAVA_HOME
to
/Library/Java/JavaVirtualMachines/jdk-11.jdk/Contents/Home.
So I set JAVA_HOME back to JDK8 use:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home
and everything runs smoothly. That path is for my Mac, you can find your own JAVA_HOME
. Hope that will help.
Upvotes: 4
Reputation: 555
Exception simply means there is no Task manger hence no slots available to run the job. Reason for Task manager going done can be many e.g. an run time exception of miss configuration. Just check the logs for exact reason. You need to restart the cluster and when task managers are available in dashboard run the job again. You can have proper restart strategy defined in config like FIXED delay restart so that job will retry in case of genuine failure.
Upvotes: 0