Saurabh
Saurabh

Reputation: 199

Is there a version compatibility issue between Spark/Hadoop/Scala/Java/Python?

I'm getting an error while running spark-shell command through cmd but unfortunately without any luck so far. I have Python/Java/Spark/Hadoop(winutils.exe)/Scala installed with versions as below:

I followed below steps and ran spark-shell (C:\Program Files\spark-3.2.0-bin-hadoop3.2\bin>) through cmd:

  1. Create JAVA_HOME variable: C:\Program Files\Java\jdk1.8.0_311\bin
  2. Add the following part to your path: %JAVA_HOME%\bin
  3. Create SPARK_HOME variable: C:\spark-3.2.0-bin-hadoop3.2\bin
  4. Add the following part to your path: %SPARK_HOME%\bin
  5. The most important part Hadoop path should include bin file before winutils.exe as the following: C:\Hadoop\bin Sure you will locate winutils.exe inside this path.
  6. Create HADOOP_HOME Variable: C:\Hadoop
  7. Add the following part to your path: %HADOOP_HOME%\bin

Am I missing out on anything? I've posted my question with error details in another thread (spark-shell command throwing this error: SparkContext: Error initializing SparkContext)

Upvotes: 2

Views: 4461

Answers (1)

jgp
jgp

Reputation: 2091

You went the difficult way in installing everything by hand. You may need Scala too, be extremely vigilant with the version you are installing, from your example it seems like it’s Scala 2.12.

But you are right: Spark is extremely demanding in term of version matching. Java 8 is good. Java 11 is ok too, but not any more recent version.

Alternatively, you can:

  1. Try a very simple app like in https://github.com/jgperrin/net.jgp.books.spark.ch01
  2. Use Docker with a pre made image, and if your goal is to do Python, I would recommend an image with Jupiter and Spark preconfigured together.

Upvotes: 1

Related Questions