Ram
Ram

Reputation: 21

Override spark's libraries in spark submit

Our application's hadoop cluster has spark 1.5 installed. But due to specific requirements we have developed spark job with version 2.0.2. When I submit the job to yarn, I use the --jars command to override the spark libraries in cluster. But still it is not picking the scala library jar. It throws an error saying

ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
    at org.apache.spark.sql.SparkSession$Builder.config(SparkSession.scala:713)
    at org.apache.spark.sql.SparkSession$Builder.appName(SparkSession.scala:704)

Any ideas about how to override the cluster libraries during spark submit ?

The shell command I use to submit the job is below.

spark-submit \
  --jars test.jar,spark-core_2.11-2.0.2.jar,spark-sql_2.11-2.0.2.jar,spark-catalyst_2.11-2.0.2.jar,scala-library-2.11.0.jar \
  --class Application \
  --master yarn \
  --deploy-mode cluster \
  --queue xxx \
  xxx.jar \
  <params>

Upvotes: 2

Views: 1941

Answers (1)

Erik Schmiegelow
Erik Schmiegelow

Reputation: 2759

That's fairly easy - Yarn doesn't care which version of Spark you are running, it will execute the jars provided by the yarn client which is packaged by spark submit. That process packages your application jar along the spark libs.

In order to deploy Spark 2.0 instead of the provided 1.5, you just need to install spark 2.0 on the host from which you start your job, e.g. in your home dir, set the YARN_CONF_DIR env vars to point to your hadoop conf and then use that spark-submit.

Upvotes: 1

Related Questions