Reputation: 31
I am using spark streaming job to execute multiple tasks. It is running fine for around 5-6 hours but after that it failed with following exception. Spark streaming job is running on yarn cluster with 20 GB RAM and 8 core.
Application application_1435667829099_0003 failed 2 times due to AM Container for appattempt_1435667829099_0003_000002 exited with exitCode: 11
For more detailed output, check application tracking page:http://hdp-master:8088/proxy/application_1435667829099_0003/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e09_1435667829099_0003_02_000001
Exit code: 11
Stack trace: ExitCodeException exitCode=11:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 11
Failing this attempt. Failing the application.
Please suggest me.
Upvotes: 1
Views: 4834
Reputation: 429
I'm not sure that the solution I found can resolve your issue or not. However, it resolved my issue related to exitCode: 11 (reason: Max number of executor failures (16) reached)
The root cause of my issue is we use sparkContext.stop() at the end of main function. It stops all executors. However, some async process (akka tries to send message) still works and try to call driver/executors. It cannot call successfully because all executors/driver has been shut down. It tries many times and stop with exitCode: 11.
Solution: remove sparkContext.stop() function on your code, leave the stop action for GC.
Upvotes: 0
Reputation: 1
In my case there was line before exception in node-manager's log:
INFO org.apache.spark.deploy.yarn.ApplicationMaster: Final app status: FAILED,
exitCode: 11, (reason: Max number of executor failures (16) reached)
But I'm sure it's drawback of original issue. Take a look closely on node-manager logs
Upvotes: 0