Access spark-shell from different Spark versions

Question

TL;DR: Is it absolutely necessary that the Spark running a spark-shell (driver) have the exactly same version of the Spark's master?

I am using Spark 1.5.0 to connect to Spark 1.5.0-cdh5.5.0 via spark-shell:

spark-shell --master spark://quickstart.cloudera:7077 --conf "spark.executor.memory=256m"

It connects, instantiates the SparkContext and sqlContext fine. If I run:

sqlContext.sql("show tables").show()

it shows all my tables as expected.

However, if I try to access data from a table:

sqlContext.sql("select * from t1").show()

I get this error:

java.io.InvalidClassException: org.apache.spark.sql.catalyst.expressions.AttributeReference; local class incompatible: stream classdesc serialVersionUID = 370695178000872136, local class serialVersionUID = -8877631944444173448

Full stacktrace

It says that the serialVersionUIDs don't match. My hypothesis is that the problem is caused by trying to connect two different versions of spark. Any ideas if I'm right?

L. CWI · Accepted Answer

You are absolutely right.

In your spark shell, you try to deserialize a serialized object from you workers (cluster). As the versions of those classes are different, you get the java.io.InvalidClassException.

Try to use same spark versions and it will be fine.

Access spark-shell from different Spark versions

Answers (1)

Related Questions