void
void

Reputation: 2533

Spark streaming job fails after getting stopped by Driver

I have a spark streaming job which reads in data from Kafka and does some operations on it. I am running the job over a yarn cluster, Spark 1.4.1, which has two nodes with 16 GB RAM each and 16 cores each.

I have these conf passed to the spark-submit job :

--master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 3

The job returns this error and finishes after running for a short while :

INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 11,
(reason: Max number of executor failures reached)

.....

ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0:
Stopped by driver

Updated :

These logs were found too :

INFO yarn.YarnAllocator: Received 3 containers from YARN, launching executors on 3 of them.....

INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down.

....

INFO yarn.YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.

INFO yarn.ExecutorRunnable: Starting Executor Container.....

INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down...

INFO yarn.YarnAllocator: Completed container container_e10_1453801197604_0104_01_000006 (state: COMPLETE, exit status: 1)

INFO yarn.YarnAllocator: Container marked as failed: container_e10_1453801197604_0104_01_000006. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_e10_1453801197604_0104_01_000006
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
    at org.apache.hadoop.util.Shell.run(Shell.java:487)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

What might be the reasons for this? Appreciate some help.

Thanks

Upvotes: 10

Views: 4204

Answers (2)

Tin Nguyen
Tin Nguyen

Reputation: 429

I got the same issue. and I have found 1 solution to fix the issue by removing sparkContext.stop() at the end of main function, leave the stop action for GC.

Spark team has resolved the issue in Spark core, however, the fix has just been master branch so far. We need to wait until the fix has been updated into the new release.

https://issues.apache.org/jira/browse/SPARK-12009

Upvotes: -3

Faisal Ahmed Siddiqui
Faisal Ahmed Siddiqui

Reputation: 707

can you please show your scala/java code that is reading from kafka? I suspect you probably not creating your SparkConf correctly.

Try something like

SparkConf sparkConf = new SparkConf().setAppName("ApplicationName");

also try running application in yarn-client mode and share the output.

Upvotes: 3

Related Questions