Knight71
Knight71

Reputation: 2949

How to give dependent jars to spark submit in cluster mode

I am running spark using cluster mode for deployment . Below is the command

JARS=$JARS_HOME/amqp-client-3.5.3.jar,$JARS_HOME/nscala-time_2.10-2.0.0.jar,\
$JARS_HOME/rabbitmq-0.1.0-RELEASE.jar,\
$JARS_HOME/kafka_2.10-0.8.2.1.jar,$JARS_HOME/kafka-clients-0.8.2.1.jar,\
$JARS_HOME/spark-streaming-kafka_2.10-1.4.1.jar,\
$JARS_HOME/zkclient-0.3.jar,$JARS_HOME/protobuf-java-2.4.0a.jar

dse spark-submit -v --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
 --executor-memory 512M \
 --total-executor-cores 3 \
 --deploy-mode "cluster" \
 --master spark://$MASTER:7077 \
 --jars=$JARS \
 --supervise \
 --class "com.testclass" $APP_JAR  input.json \
 --files "/home/test/input.json"

The above command is working fine in client mode. But when I use it in cluster mode I get class not found exception

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
    at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$

In client mode the dependent jars are getting copied to the /var/lib/spark/work directory whereas in cluster mode it is not. Please help me in getting this solved.

EDIT:

I am using nfs and I have mounted the same directory on all the spark nodes under same name. Still I get the error. How it is able to pick the application jar which is also under same directory but not the dependent jars ?

Upvotes: 4

Views: 14784

Answers (2)

WoodChopper
WoodChopper

Reputation: 4375

As spark documentation says,

  1. Keep all jars and dependencies in same local path in all nodes in cluster or
  2. Keep the jar is distributed files system where all nodes have access to.

Spark properties

Upvotes: 1

Shawn Guo
Shawn Guo

Reputation: 3228

In client mode the dependent jars are getting copied to the /var/lib/spark/work directory whereas in cluster mode it is not.

In Cluster mode, driver pragram is running in the cluster not in local(compared to client mode) and dependent jars should be accessible in cluster, otherwise driver program and executor will throw "java.lang.NoClassDefFoundError" exception.

Actually When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster.

Your extra jars could be added to --jars, they will be copied to cluster automatically.

please refer to "Advanced Dependency Management" section in below link:
http://spark.apache.org/docs/latest/submitting-applications.html

Upvotes: 7

Related Questions