Reputation: 3728
I'm trying to submit python spark application on yarn-cluster mode.
Seq(System.getenv("SPARK_HOME")+"/bin/spark-submit","--master",sparkConfig.getString("spark.master"),"--executor-memory",sparkConfig.getString("spark.executor-memory"),"--num-executors",sparkConfig.getString("spark.num-executors"),"python/") !
I'm getting following error ,
Diagnostics: File does not exist: hdfs://xxxxxx:8020/user/hdfs/.sparkStaging/application_123456789_0138/ File does not exist: hdfs://xxxxxx:8020/user/hdfs/.sparkStaging/application_123456789_0138/
I found
But the ticket is still open !
Upvotes: 2
Views: 7803
Reputation: 19
HADOOP_CONF_DIR variable must be set so spark can find this file.
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
Set it in $SPARK_HOME/conf/
Upvotes: 0
Reputation: 116
I answered this here For me, the key was that spark.hadoop.fs.defaultFS must be set in SparkConf inside Python.
yarn_conf = SparkConf().setAppName(_app_name) \
.setMaster("yarn") \
.set("spark.executor.memory", "4g") \
.set("spark.hadoop.fs.defaultFS", "hdfs://{}:8020".format(_fs_host)) \
.set("spark.hadoop.yarn.resourcemanager.hostname", _rm_host)\
.set("spark.hadoop.yarn.resourcemanager.address", "{}:8050".format(_rm_host))
Upvotes: 0
Reputation: 13
Try to add HDFS name node property to yarn-site.xml:
Ensure that YARN_CONF_DIR env variable points to the directory of yarn-site.xml
Upvotes: 0
Reputation: 1343
This happens when you are trying to spark-submit a job with deploy-mode "cluster" and you are trying to set master as "local"; e.g.
val sparkConf = new SparkConf().setAppName("spark-pi-app").setMaster("local[10]");
You have two options: Option #1: Change the above line to:
val sparkConf = new SparkConf().setAppName("spark-pi-app");
and submit your job as
./bin/spark-submit --master yarn --deploy-mode cluster --driver-memory 512m --executor-memory 512m --executor-cores 1 --num-executors 3 --jars hadoop-common-{version}.jar,hadoop-lzo-{version}.jar --verbose --queue hadoop-queue --class "SparkPi" sparksbtproject_2.11-1.0.jar
Option #2: Submit your job with deploy-mode as "client"
./bin/spark-submit --master yarn --deploy-mode client --driver-memory 512m --executor-memory 512m --executor-cores 1 --num-executors 3 --jars hadoop-common-{version}.jar,hadoop-lzo-{version}.jar --verbose --queue hadoop-queue --class "SparkPi" sparksbtproject_2.11-1.0.jar
Upvotes: 3
Reputation: 168
In my experience with scala jobs i have seen that the yarn-cluster cluster mode gives this error when the code is trying to setMaster("local") somewhere. Please try to remove any reference to setting a local "master".
Again, My answer is based on the scala behavior but hope this helps.
Upvotes: 2
Reputation: 4427
Are you failing to create a proper spark context? I suspect that is the issue. I have also updated
Upvotes: 0