Vish
Vish

Reputation: 184

InvalidClassException while running spark job using spark submit

I am trying to run the spark job which basically loads data in cassandra table. but it is giving following error.

java.io.InvalidClassException: com.datastax.spark.connector.rdd.CassandraRDD; local class incompatible: stream classdesc serialVersionUID = 8438672869923844256, local class serialVersionUID = -8421059671969407806

I have three spark nodes and I used following script to run it.

spark-submit --class test.testchannels --master spark://ubuntu:7077 --deploy-mode client --jars /home/user/BigData/jars/spark-cassandra-connector_2.11-2.0.0-RC1.jar,/home/user/BigData/jars/spark-cassandra-connector-java_2.10-1.6.0-M1.jar /home/user/BigData/SparkJobs/testchannelsparksubmit.jar /home/user/Data/channel_30Jun2017.csv

Also I have copied the cassandra related jars on worker nodes also on the same path.

Upvotes: 1

Views: 247

Answers (1)

RussS
RussS

Reputation: 16576

You have mixed versions of the SCC in your cluster. The jars locally have one definition of CassandraRDD and the remote ones have a different version. It is highly recommended that you do not copy jars into your spark worker directories as it is very easy to make this sort of mistake. It's much simpler to use the --packages command and allow spark to distributed your resources.

/home/user/BigData/jars/spark-cassandra-connector_2.11-2.0.0-RC1.jar,/home/user/BigData/jars/spark-cassandra-connector-java_2.10-1.6.0-M1.jar

Is most likely the culprit since you are not only combining 2 different versions of the connector, they are also two different versions of Spark. After 1.6.0 all of the "java" modules were merged into the core module so there is no need for the -java artifact. In addition, RC1 is not the released version of the connector (Release Candidate 1), you should be using 2.0.2 which is the latest release as of this post.

Upvotes: 2

Related Questions