Reputation: 63
I try to setup and run a Spark cluster running on top of YARN and using HDFS.
I first set up Hadoop for HDFS using hadoop-3.1.0. Then I configured YARN and started both. I was able to upload data to the HDFS and yarn also seems to work fine.
Then I installed spark-2.3.0-bin-without-hadoop on my master only and tried to submit an application. Since it is spark without Hadoop I had to modify spark-env.sh, adding the following line like mentioned in the documentation:
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
Using only this line I got the following exception:
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
Which I guess means that he does not find the Spark-libraries. So I added the spark jars to the classpath:
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath):/usr/local/spark/jars/*
But now I get the following Exception:
com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.7.8
As it turns out, Hadoop 3.1.0 provides Jackson 2.7.8 while Spark 2.3.0 provides Jackson 2.6.7. As I see it, both are now in the classpath resulting in a conflict.
Since it seems I really need both the Hadoop and Spark libraries to submit anything, I do not know how to get around that problem.
Upvotes: 1
Views: 1108
Reputation: 4532
How is Hadoop-3.0.0 's compatibility with older versions of Hive, Pig, Sqoop and Spark
there was answer from @JacekLaskowski that spark is not supported on hadoop 3. As far as I know, nothing changed for last 6 month in that area.
Upvotes: 3