Reputation: 331
I am trying to run a fat jar on a Spark cluster using Spark submit. I made the cluster using "spark-ec2" executable in Spark bundle on AWS.
The command I am using to run the jar file is
bin/spark-submit --class edu.gatech.cse8803.main.Main --master yarn-cluster ../src1/big-data-hw2-assembly-1.0.jar
In the beginning it was giving me the error that at least one of the HADOOP_CONF_DIR or YARN_CONF_DIR environment variable must be set. I didn't know what to set them to, so I used the following command
export HADOOP_CONF_DIR=/mapreduce/conf
Now the error has changed to
Could not load YARN classes. This copy of Spark may not have been compiled with YARN support.
Run with --help for usage help or --verbose for debug output
The home directory structure is as follows
ephemeral-hdfs hadoop-native mapreduce persistent-hdfs scala spark spark-ec2 src1 tachyon
I even set the YARN_CONF_DIR variable to the same value as HADOOP_CONF_DIR, but the error message is not changing. I am unable to find any documentation that highlights this issue, most of them just mention these two variables and give no further details.
Upvotes: 3
Views: 5115
Reputation: 1844
You need to compile spark against Yarn to use it.
Follow the steps explained here: https://spark.apache.org/docs/latest/building-spark.html
Maven:
build/mvn -Pyarn -Phadoop-2.x -Dhadoop.version=2.x.x -DskipTests clean package
SBT:
build/sbt -Pyarn -Phadoop-2.x assembly
You can also download a pre-compiled version here: http://spark.apache.org/downloads.html (choose a "pre-built for Hadoop")
Upvotes: 3
Reputation: 2043
Download prebuilt spark which supports hadoop 2.X versions from https://spark.apache.org/downloads.html
Upvotes: 1
Reputation: 85
The --master
argument should be: --master spark://hostname:7077
where hostname is the name of your Spark master server. You can also specify this value as spark.master
in the spark-defaults.conf file and leave out the --master
argument when using Spark submit from the command line. Including the --master
argument will override the value set (if exists) in the spark-defaults.conf file.
Reference: http://spark.apache.org/docs/1.3.0/configuration.html
Upvotes: 0