simpadjo
simpadjo

Reputation: 4017

Spark in cluster mode throws error if a SparkContext is not started

I have a Spark job that initializes the spark context only if it is really necessary:

val conf = new SparkConf()
val jobs: List[Job] = ??? //get some jobs
if(jobs.nonEmpty) {
  val sc = new SparkContext(conf)
  sc.parallelize(jobs).foreach(....)
} else {
    //do nothing
}

It worked fine on Yarn if deploy-mode is 'client'

spark-submit --master yarn --deploy-mode client

Then I switched deploy mode to 'cluster' and it started to crash in case of jobs.isEmpty

spark-submit --master yarn --deploy-mode cluster

Below is the error text:

INFO yarn.Client: Application report for application_1509613523426_0017 (state: ACCEPTED) 17/11/02 11:37:17

INFO yarn.Client: Application report for application_1509613523426_0017 (state: FAILED) 17/11/02 11:37:17

INFO yarn.Client: client token: N/A diagnostics: Application application_1509613523426_0017 failed 2 times due to AM Container for appattempt_1509613523426_0017_000002 exited with exitCode: -1000 For more detailed output, check application tracking

page:http://xxxxxx.com:8088/cluster/app/application_1509613523426_0017Then, click on links to logs of each attempt. Diagnostics: File does not exist: hdfs://xxxxxxx/.sparkStaging/application_1509613523426_0017/__spark_libs__997458388067724499.zip java.io.FileNotFoundException: File does not exist: hdfs://xxxxxxx/.sparkStaging/application_1509613523426_0017/__spark_libs__997458388067724499.zip at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)

Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: dev start time: 1509622629354 final status: FAILED tracking URL: http://xxxxxx.com:8088/cluster/app/application_1509613523426_0017 user: xxx Exception in thread "main" org.apache.spark.SparkException: Application application_1509613523426_0017 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/11/02 11:37:17 INFO util.ShutdownHookManager: Shutdown hook called 17/11/02 11:37:17 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-a5b20def-0218-4b0c-b9f8-fdf8a1802e95

Is it a bug in Yarn support or I'm missing something?

Upvotes: 0

Views: 2751

Answers (1)

user8874806
user8874806

Reputation: 36

SparkContext is the one who is responsible for communication with cluster manager. If application is submitted to the cluster, but context is never created, YARN cannot determine the state of the application - this is why you get an error.

Upvotes: 2

Related Questions