Windsaw
Windsaw

Reputation: 63

Jar conflicts between apache spark and hadoop

I try to setup and run a Spark cluster running on top of YARN and using HDFS.

I first set up Hadoop for HDFS using hadoop-3.1.0. Then I configured YARN and started both. I was able to upload data to the HDFS and yarn also seems to work fine.

Then I installed spark-2.3.0-bin-without-hadoop on my master only and tried to submit an application. Since it is spark without Hadoop I had to modify spark-env.sh, adding the following line like mentioned in the documentation:

export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)

Using only this line I got the following exception:

Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster

Which I guess means that he does not find the Spark-libraries. So I added the spark jars to the classpath:

export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath):/usr/local/spark/jars/*

But now I get the following Exception:

com.fasterxml.jackson.databind.JsonMappingException: Incompatible Jackson version: 2.7.8

As it turns out, Hadoop 3.1.0 provides Jackson 2.7.8 while Spark 2.3.0 provides Jackson 2.6.7. As I see it, both are now in the classpath resulting in a conflict.

Since it seems I really need both the Hadoop and Spark libraries to submit anything, I do not know how to get around that problem.

Upvotes: 1

Views: 1108

Answers (1)

Natalia
Natalia

Reputation: 4532

How is Hadoop-3.0.0 's compatibility with older versions of Hive, Pig, Sqoop and Spark

there was answer from @JacekLaskowski that spark is not supported on hadoop 3. As far as I know, nothing changed for last 6 month in that area.

Upvotes: 3

Related Questions