Spark Submit Issue

I am trying to run a fat jar on a Spark cluster using Spark submit. I made the cluster using "spark-ec2" executable in Spark bundle on AWS.

The command I am using to run the jar file is

bin/spark-submit --class edu.gatech.cse8803.main.Main --master yarn-cluster ../src1/big-data-hw2-assembly-1.0.jar

In the beginning it was giving me the error that at least one of the HADOOP_CONF_DIR or YARN_CONF_DIR environment variable must be set. I didn't know what to set them to, so I used the following command

export HADOOP_CONF_DIR=/mapreduce/conf

Now the error has changed to

Could not load YARN classes. This copy of Spark may not have been compiled with YARN support.
Run with --help for usage help or --verbose for debug output

The home directory structure is as follows

ephemeral-hdfs  hadoop-native  mapreduce  persistent-hdfs  scala  spark  spark-ec2  src1  tachyon

I even set the YARN_CONF_DIR variable to the same value as HADOOP_CONF_DIR, but the error message is not changing. I am unable to find any documentation that highlights this issue, most of them just mention these two variables and give no further details.

Upvotes: 3

Answers (3)

Bacon

Reputation: 1844

You need to compile spark against Yarn to use it.

Follow the steps explained here: https://spark.apache.org/docs/latest/building-spark.html

Maven:

build/mvn -Pyarn -Phadoop-2.x -Dhadoop.version=2.x.x -DskipTests clean package

SBT:

build/sbt -Pyarn -Phadoop-2.x assembly

You can also download a pre-compiled version here: http://spark.apache.org/downloads.html (choose a "pre-built for Hadoop")

Upvotes: 3

Jishnu Prathap

Reputation: 2043

Download prebuilt spark which supports hadoop 2.X versions from https://spark.apache.org/downloads.html

Upvotes: 1

billz

Reputation: 85

The --master argument should be: --master spark://hostname:7077 where hostname is the name of your Spark master server. You can also specify this value as spark.master in the spark-defaults.conf file and leave out the --master argument when using Spark submit from the command line. Including the --master argument will override the value set (if exists) in the spark-defaults.conf file.

Reference: http://spark.apache.org/docs/1.3.0/configuration.html

Upvotes: 0

Spark Submit Issue

Answers (3)

Related Questions