Reputation: 569
TL;DR: Is it absolutely necessary that the Spark running a spark-shell (driver) have the exactly same version of the Spark's master?
I am using Spark 1.5.0 to connect to Spark 1.5.0-cdh5.5.0 via spark-shell:
spark-shell --master spark://quickstart.cloudera:7077 --conf "spark.executor.memory=256m"
It connects, instantiates the SparkContext and sqlContext fine. If I run:
sqlContext.sql("show tables").show()
it shows all my tables as expected.
However, if I try to access data from a table:
sqlContext.sql("select * from t1").show()
I get this error:
java.io.InvalidClassException: org.apache.spark.sql.catalyst.expressions.AttributeReference; local class incompatible: stream classdesc serialVersionUID = 370695178000872136, local class serialVersionUID = -8877631944444173448
It says that the serialVersionUIDs don't match. My hypothesis is that the problem is caused by trying to connect two different versions of spark. Any ideas if I'm right?
Upvotes: 5
Views: 251
Reputation: 962
You are absolutely right.
In your spark shell, you try to deserialize a serialized object from you workers (cluster). As the versions of those classes are different, you get the java.io.InvalidClassException
.
Try to use same spark versions and it will be fine.
Upvotes: 5